Closed ganyuancao closed 4 days ago
Hello @ganyuancao , These errors are not easy to debug without being able to reproduce them. Could you therefore :
Stack dump without symbol names
message Thanks for the report !
Hello @ganyuancao , These errors are not easy to debug without being able to reproduce them. Could you therefore :
- share a reproducible code (in particular, the parameters you use for the compilation and so on. For the inputset, it could be fake data with the same ranges, as long as you have the same issue and we can run it easily !
- share the complete traceback, as we believe it is incomplete. For example, you might have some additional error lines right above the
Stack dump without symbol names
messageThanks for the report !
Hi @RomanBredehoft ,
Thank you for the update!
Yes I have already included the parameters for the compilation etc...
model = compile_brevitas_qat_model(model, cali_image, n_bits=n_bits, rounding_threshold_bits={'n_bits': n_bits, 'method': 'approximate'}, p_error=1e-5)
model.fhe_circuit.keygen(seed=key_seed, encryption_seed=enc_seed)
input_q = model.quantize_input(input)
output_q = model.quantized_forward(input_q, fhe='execute')
output = model.dequantize_output(output_q)
With the model, I guess you should be able to reproduce it with random data of shape torch.Size([1, 1, 256, 256])
.
For the traceback, there is actually no additional error lines above the Stack dump without symbol names
. This message is right below the output of my program like that
Running Inference in FHE on Sample 1 ...
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0 libLLVM-17git-c7b3ee8f.so 0x00007a94d9152b51 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 225
1 libLLVM-17git-c7b3ee8f.so 0x00007a94d9150564
2 libc.so.6 0x00007a95b4c42520
3 sharedlib.so 0x00007a948b605390 concrete_main + 576
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Segmentation fault (core dumped)
Thank you!
Thank you for your quick answer @ganyuancaom !
I just tried to reproduce your error by using :
key_seed = 0
enc_seed = 0
model = UNet(bit_width=n_bits)
cali_image = torch.randn((1,1,256,256))
but got a different errors when compiling :
n_bits=8
and n_bits=7
gives RuntimeError: Function you are trying to compile cannot be compiled
n_bits=6
and below gives Could not determine a unique scale for the quantization! Please check the ONNX graph of this model.
, which looks directly related to your CustomUpsample
modulecould you then specify which configuration you are using so that we better understand what triggers yours ? Thanks !
Ideally, you would send a single.py that contains everything such that we launch and reproduce directly!
Thank you for your quick answer @ganyuancaom !
I just tried to reproduce your error by using :
key_seed = 0 enc_seed = 0 model = UNet(bit_width=n_bits) cali_image = torch.randn((1,1,256,256))
but got a different errors when compiling :
n_bits=8
andn_bits=7
givesRuntimeError: Function you are trying to compile cannot be compiled
n_bits=6
and below givesCould not determine a unique scale for the quantization! Please check the ONNX graph of this model.
, which looks directly related to yourCustomUpsample
modulecould you then specify which configuration you are using so that we better understand what triggers yours ? Thanks !
Hi @RomanBredehoft ,
I am using n_bits=7
and it can actually be compiled and run in simulation mode on my side. I sent you a message at FHE discord. Can you maybe check that out?
Thanks!
Yes I got your code and will check when I can, thanks !
Hello again ! We took a look at your issue problem and this is a memory issue : at run time, the compiler tries to allocate up to 600 GB of memory, and thus fails 😅
This is somewhat expected, your model is pretty significant and thus requires a lot of computations from Concrete. Here are different alternatives (other than using a bigger machine) :
n_bits=8
with rounding_threshold_bits=6
(with approximate) gives the best tradeoff, but you could try n_bits=7
and/or rounding_threshold_bits=5
as wellWith that being said, your use case started an internal discussion about potential optimization on memory allocation so you might expect some improvements in the future ! We are also planning on adding a dedicated and explicit error message for issues like you got.
So again, thanks for the report and I hope that you'll be able to find proper solutions to your use case !
Hello, I am having a problem with the LLVM symbolizer when I try to run the FHE in 'execute' mode. It works well when I compile the model and run it in 'simulation' mode. Specifically, the error message is
The way I run it is
And my model is
I am using the following versions
And my hardware is
Is it actually a bug or is there a way how I can fix that?
Thank you very much!