LLVM symbolizer error when running FHE in 'execute' mode

Hello, I am having a problem with the LLVM symbolizer when I try to run the FHE in 'execute' mode. It works well when I compile the model and run it in 'simulation' mode. Specifically, the error message is

Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  libLLVM-17git-c7b3ee8f.so 0x00007c46a6f52b51 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 225
1  libLLVM-17git-c7b3ee8f.so 0x00007c46a6f50564
2  libc.so.6                 0x00007c4782842520
3  sharedlib.so              0x00007c4638605390 concrete_main + 576
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Segmentation fault (core dumped)

The way I run it is

model = compile_brevitas_qat_model(model, cali_image, n_bits=n_bits, rounding_threshold_bits={'n_bits': n_bits, 'method': 'approximate'}, p_error=1e-5)
model.fhe_circuit.keygen(seed=key_seed, encryption_seed=enc_seed)
input_q = model.quantize_input(input)
output_q = model.quantized_forward(input_q, fhe='execute')
output = model.dequantize_output(output_q)

And my model is

import torch
import torch.nn as nn
import brevitas.nn as qnn
from brevitas.quant import Int8ActPerTensorFloat, Int8WeightPerTensorFloat
import torch.nn.utils.prune as prune

# Define a customized upsample module for FHE compatibility
class CustomUpsample(nn.Module):
    def __init__(self, in_channels, out_channels, scale_factor):
        super(CustomUpsample, self).__init__()
        self.scale_factor = scale_factor
        # Define a quantized convolutional layer for upscaling
        self.conv = qnn.QuantConv2d(
            in_channels, out_channels, kernel_size=3, padding=1,
            weight_quant=Int8WeightPerTensorFloat, act_quant=Int8ActPerTensorFloat)

    def forward(self, x):
        # Perform nearest neighbor upsampling
        batch_size, channels, height, width = x.shape
        x = x.unsqueeze(-1).unsqueeze(-1)
        x = x.expand(batch_size, channels, height, width, self.scale_factor, self.scale_factor)
        x = x.reshape(batch_size, channels, height * self.scale_factor, width * self.scale_factor)
        return self.conv(x)

# Define a quantized convolutional layer with batch normalization and ReLU activation
class Conv2d(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1, bit_width=7):
        super(Conv2d, self).__init__()
        # Define a quantized convolutional layer
        self.conv = qnn.QuantConv2d(
            in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride, padding=padding,
            weight_quant=Int8WeightPerTensorFloat, act_quant=Int8ActPerTensorFloat)
        # Batch normalization layer
        self.bn = nn.BatchNorm2d(num_features=out_channels)
        # Quantized ReLU activation layer
        self.relu = qnn.QuantReLU(bit_width=bit_width, act_quant=Int8ActPerTensorFloat) 

    def forward(self, x):
        # Forward pass through convolution, BN, and ReLU
        x = self.conv(x)
        x = self.bn(x)
        x = self.relu(x)
        return x

# Define a UNet architecture with quantized components
class UNet(nn.Module):
    def __init__(self, bit_width=7): 
        super(UNet, self).__init__()

        # Encoder path
        self.enc1 = nn.Sequential(
            Conv2d(in_channels=1, out_channels=32, bit_width=bit_width),  
            Conv2d(in_channels=32, out_channels=32, bit_width=bit_width)
        )
        self.pool1 = nn.MaxPool2d(kernel_size=2)

        self.enc2 = nn.Sequential(
            Conv2d(in_channels=32, out_channels=64, bit_width=bit_width),
            Conv2d(in_channels=64, out_channels=64, bit_width=bit_width) 
        )
        self.pool2 = nn.MaxPool2d(kernel_size=2)

        self.enc3 = nn.Sequential(
            Conv2d(in_channels=64, out_channels=128, bit_width=bit_width), 
            Conv2d(in_channels=128, out_channels=128, bit_width=bit_width) 
        )
        self.pool3 = nn.MaxPool2d(kernel_size=2)

        self.enc4 = nn.Sequential(
            Conv2d(in_channels=128, out_channels=256, bit_width=bit_width), 
            Conv2d(in_channels=256, out_channels=256, bit_width=bit_width) 
        )
        self.pool4 = nn.MaxPool2d(kernel_size=2)

        self.enc5 = nn.Sequential(
            Conv2d(in_channels=256, out_channels=512, bit_width=bit_width),
            Conv2d(in_channels=512, out_channels=512, bit_width=bit_width)
        )
        self.pool5 = nn.MaxPool2d(kernel_size=2)

        # Bottleneck (central) layer
        self.bottleneck = nn.Sequential(
            Conv2d(in_channels=512, out_channels=1024, bit_width=bit_width), 
            Conv2d(in_channels=1024, out_channels=1024, bit_width=bit_width) 
        )

        # Upsampling path with CustomUpsample and decoder layers
        self.up5 = CustomUpsample(in_channels=1024, out_channels=512, scale_factor=2)
        self.dec5 = nn.Sequential(
            Conv2d(in_channels=1024, out_channels=512, bit_width=bit_width), 
            Conv2d(in_channels=512, out_channels=512, bit_width=bit_width)  
        )

        self.up4 = CustomUpsample(in_channels=512, out_channels=256, scale_factor=2)
        self.dec4 = nn.Sequential(
            Conv2d(in_channels=512, out_channels=256, bit_width=bit_width), 
            Conv2d(in_channels=256, out_channels=256, bit_width=bit_width)  
        )

        self.up3 = CustomUpsample(in_channels=256, out_channels=128, scale_factor=2)
        self.dec3 = nn.Sequential(
            Conv2d(in_channels=256, out_channels=128, bit_width=bit_width), 
            Conv2d(in_channels=128, out_channels=128, bit_width=bit_width)  
        )

        self.up2 = CustomUpsample(in_channels=128, out_channels=64, scale_factor=2)
        self.dec2 = nn.Sequential(
            Conv2d(in_channels=128, out_channels=64, bit_width=bit_width),  
            Conv2d(in_channels=64, out_channels=64, bit_width=bit_width)
        )

        self.up1 = CustomUpsample(in_channels=64, out_channels=32, scale_factor=2)
        self.dec1 = nn.Sequential(
            Conv2d(in_channels=64, out_channels=32, bit_width=bit_width), 
            Conv2d(in_channels=32, out_channels=32, bit_width=bit_width) 
        )

        # Final convolutional layer for output
        self.final = qnn.QuantConv2d(
            in_channels=32, out_channels=1, kernel_size=1, stride=1, padding=0,
            weight_quant=Int8WeightPerTensorFloat, act_quant=Int8ActPerTensorFloat)

        # Quantized identity layers for input, output, and encoder activations
        self.quant_inp = qnn.QuantIdentity(act_quant=Int8ActPerTensorFloat, bit_width=bit_width)
        self.quant_out = qnn.QuantIdentity(act_quant=Int8ActPerTensorFloat, bit_width=bit_width)
        self.quant_enc1 = qnn.QuantIdentity(act_quant=Int8ActPerTensorFloat, bit_width=bit_width)
        self.quant_enc2 = qnn.QuantIdentity(act_quant=Int8ActPerTensorFloat, bit_width=bit_width)
        self.quant_enc3 = qnn.QuantIdentity(act_quant=Int8ActPerTensorFloat, bit_width=bit_width)
        self.quant_enc4 = qnn.QuantIdentity(act_quant=Int8ActPerTensorFloat, bit_width=bit_width)
        self.quant_enc5 = qnn.QuantIdentity(act_quant=Int8ActPerTensorFloat, bit_width=bit_width)

    def forward(self, x):
        # Forward pass through the network
        x = self.quant_inp(x)
        enc1 = self.enc1(x)
        enc2 = self.enc2(self.pool1(enc1))
        enc3 = self.enc3(self.pool2(enc2))
        enc4 = self.enc4(self.pool3(enc3))
        enc5 = self.enc5(self.pool4(enc4))

        bottleneck = self.bottleneck(self.pool5(enc5))

        up5 = self.up5(bottleneck)
        enc5 = self.quant_enc5(enc5)
        up5 = self.quant_enc5(up5)
        dec5 = self.dec5(torch.cat((up5, enc5), dim=1))

        up4 = self.up4(dec5)
        enc4 = self.quant_enc4(enc4)
        up4 = self.quant_enc4(up4)
        dec4 = self.dec4(torch.cat((up4, enc4), dim=1))

        up3 = self.up3(dec4)
        enc3 = self.quant_enc3(enc3)
        up3 = self.quant_enc3(up3)
        dec3 = self.dec3(torch.cat((up3, enc3), dim=1))

        up2 = self.up2(dec3)
        enc2 = self.quant_enc2(enc2)
        up2 = self.quant_enc2(up2)
        dec2 = self.dec2(torch.cat((up2, enc2), dim=1))

        up1 = self.up1(dec2)
        enc1 = self.quant_enc1(enc1)
        up1 = self.quant_enc1(up1)
        dec1 = self.dec1(torch.cat((up1, enc1), dim=1))

        output = torch.sigmoid(self.final(dec1))
        return self.quant_out(output)

I am using the following versions

python 3.10.12
concrete-ml 1.6.1
concrete-python 2.7.0

And my hardware is

Screenshot 2024-07-24 at 14 53 12

Is it actually a bug or is there a way how I can fix that?

Thank you very much!

Hello @ganyuancao , These errors are not easy to debug without being able to reproduce them. Could you therefore :

share a reproducible code (in particular, the parameters you use for the compilation and so on. For the inputset, it could be fake data with the same ranges, as long as you have the same issue and we can run it easily !
share the complete traceback, as we believe it is incomplete. For example, you might have some additional error lines right above the Stack dump without symbol names message

Thanks for the report !

Hello @ganyuancao , These errors are not easy to debug without being able to reproduce them. Could you therefore :

share a reproducible code (in particular, the parameters you use for the compilation and so on. For the inputset, it could be fake data with the same ranges, as long as you have the same issue and we can run it easily !

share the complete traceback, as we believe it is incomplete. For example, you might have some additional error lines right above the Stack dump without symbol names message

Thanks for the report !

Hi @RomanBredehoft ,

Thank you for the update!

Yes I have already included the parameters for the compilation etc...

model = compile_brevitas_qat_model(model, cali_image, n_bits=n_bits, rounding_threshold_bits={'n_bits': n_bits, 'method': 'approximate'}, p_error=1e-5)
model.fhe_circuit.keygen(seed=key_seed, encryption_seed=enc_seed)
input_q = model.quantize_input(input)
output_q = model.quantized_forward(input_q, fhe='execute')
output = model.dequantize_output(output_q)

With the model, I guess you should be able to reproduce it with random data of shape torch.Size([1, 1, 256, 256]).

For the traceback, there is actually no additional error lines above the Stack dump without symbol names. This message is right below the output of my program like that

Running Inference in FHE on Sample 1 ...
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  libLLVM-17git-c7b3ee8f.so 0x00007a94d9152b51 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 225
1  libLLVM-17git-c7b3ee8f.so 0x00007a94d9150564
2  libc.so.6                 0x00007a95b4c42520
3  sharedlib.so              0x00007a948b605390 concrete_main + 576
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Segmentation fault (core dumped)

Thank you!

Thank you for your quick answer @ganyuancaom !

I just tried to reproduce your error by using :

key_seed = 0
enc_seed = 0

model = UNet(bit_width=n_bits)

cali_image = torch.randn((1,1,256,256))

but got a different errors when compiling :

n_bits=8 and n_bits=7 gives RuntimeError: Function you are trying to compile cannot be compiled
n_bits=6 and below gives Could not determine a unique scale for the quantization! Please check the ONNX graph of this model., which looks directly related to your CustomUpsample module

could you then specify which configuration you are using so that we better understand what triggers yours ? Thanks !

Ideally, you would send a single.py that contains everything such that we launch and reproduce directly!

Thank you for your quick answer @ganyuancaom !

I just tried to reproduce your error by using :
key_seed = 0
enc_seed = 0

model = UNet(bit_width=n_bits)

cali_image = torch.randn((1,1,256,256))
but got a different errors when compiling :

n_bits=8 and n_bits=7 gives RuntimeError: Function you are trying to compile cannot be compiled

n_bits=6 and below gives Could not determine a unique scale for the quantization! Please check the ONNX graph of this model., which looks directly related to your CustomUpsample module

could you then specify which configuration you are using so that we better understand what triggers yours ? Thanks !

Hi @RomanBredehoft ,

I am using n_bits=7 and it can actually be compiled and run in simulation mode on my side. I sent you a message at FHE discord. Can you maybe check that out?

Thanks!

Yes I got your code and will check when I can, thanks !

Hello again ! We took a look at your issue problem and this is a memory issue : at run time, the compiler tries to allocate up to 600 GB of memory, and thus fails 😅

This is somewhat expected, your model is pretty significant and thus requires a lot of computations from Concrete. Here are different alternatives (other than using a bigger machine) :

you could try your model with smaller input images, if possible
you could try to reduce the number of layers in your model, if possible, as well
in addition to that, you could try to use less rounding bits. Usually, we observe that using n_bits=8 with rounding_threshold_bits=6 (with approximate) gives the best tradeoff, but you could try n_bits=7 and/or rounding_threshold_bits=5 as well
depending on the use case you are trying to reach, you might also be interested in our hybrid models (https://docs.zama.ai/concrete-ml/guides/hybrid-models)

With that being said, your use case started an internal discussion about potential optimization on memory allocation so you might expect some improvements in the future ! We are also planning on adding a dedicated and explicit error message for issues like you got.

So again, thanks for the report and I hope that you'll be able to find proper solutions to your use case !

zama-ai / concrete-ml

LLVM symbolizer error when running FHE in 'execute' mode #808