zama-ai / concrete-ml

Concrete ML: Privacy Preserving ML framework using Fully Homomorphic Encryption (FHE), built on top of Concrete, with bindings to traditional ML frameworks.
Other
901 stars 133 forks source link

LLVM symbolizer error when running FHE in 'execute' mode #808

Closed ganyuancao closed 4 days ago

ganyuancao commented 1 month ago

Hello, I am having a problem with the LLVM symbolizer when I try to run the FHE in 'execute' mode. It works well when I compile the model and run it in 'simulation' mode. Specifically, the error message is

Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  libLLVM-17git-c7b3ee8f.so 0x00007c46a6f52b51 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 225
1  libLLVM-17git-c7b3ee8f.so 0x00007c46a6f50564
2  libc.so.6                 0x00007c4782842520
3  sharedlib.so              0x00007c4638605390 concrete_main + 576
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Segmentation fault (core dumped)

The way I run it is

model = compile_brevitas_qat_model(model, cali_image, n_bits=n_bits, rounding_threshold_bits={'n_bits': n_bits, 'method': 'approximate'}, p_error=1e-5)
model.fhe_circuit.keygen(seed=key_seed, encryption_seed=enc_seed)
input_q = model.quantize_input(input)
output_q = model.quantized_forward(input_q, fhe='execute')
output = model.dequantize_output(output_q)

And my model is

import torch
import torch.nn as nn
import brevitas.nn as qnn
from brevitas.quant import Int8ActPerTensorFloat, Int8WeightPerTensorFloat
import torch.nn.utils.prune as prune

# Define a customized upsample module for FHE compatibility
class CustomUpsample(nn.Module):
    def __init__(self, in_channels, out_channels, scale_factor):
        super(CustomUpsample, self).__init__()
        self.scale_factor = scale_factor
        # Define a quantized convolutional layer for upscaling
        self.conv = qnn.QuantConv2d(
            in_channels, out_channels, kernel_size=3, padding=1,
            weight_quant=Int8WeightPerTensorFloat, act_quant=Int8ActPerTensorFloat)

    def forward(self, x):
        # Perform nearest neighbor upsampling
        batch_size, channels, height, width = x.shape
        x = x.unsqueeze(-1).unsqueeze(-1)
        x = x.expand(batch_size, channels, height, width, self.scale_factor, self.scale_factor)
        x = x.reshape(batch_size, channels, height * self.scale_factor, width * self.scale_factor)
        return self.conv(x)

# Define a quantized convolutional layer with batch normalization and ReLU activation
class Conv2d(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1, bit_width=7):
        super(Conv2d, self).__init__()
        # Define a quantized convolutional layer
        self.conv = qnn.QuantConv2d(
            in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride, padding=padding,
            weight_quant=Int8WeightPerTensorFloat, act_quant=Int8ActPerTensorFloat)
        # Batch normalization layer
        self.bn = nn.BatchNorm2d(num_features=out_channels)
        # Quantized ReLU activation layer
        self.relu = qnn.QuantReLU(bit_width=bit_width, act_quant=Int8ActPerTensorFloat) 

    def forward(self, x):
        # Forward pass through convolution, BN, and ReLU
        x = self.conv(x)
        x = self.bn(x)
        x = self.relu(x)
        return x

# Define a UNet architecture with quantized components
class UNet(nn.Module):
    def __init__(self, bit_width=7): 
        super(UNet, self).__init__()

        # Encoder path
        self.enc1 = nn.Sequential(
            Conv2d(in_channels=1, out_channels=32, bit_width=bit_width),  
            Conv2d(in_channels=32, out_channels=32, bit_width=bit_width)
        )
        self.pool1 = nn.MaxPool2d(kernel_size=2)

        self.enc2 = nn.Sequential(
            Conv2d(in_channels=32, out_channels=64, bit_width=bit_width),
            Conv2d(in_channels=64, out_channels=64, bit_width=bit_width) 
        )
        self.pool2 = nn.MaxPool2d(kernel_size=2)

        self.enc3 = nn.Sequential(
            Conv2d(in_channels=64, out_channels=128, bit_width=bit_width), 
            Conv2d(in_channels=128, out_channels=128, bit_width=bit_width) 
        )
        self.pool3 = nn.MaxPool2d(kernel_size=2)

        self.enc4 = nn.Sequential(
            Conv2d(in_channels=128, out_channels=256, bit_width=bit_width), 
            Conv2d(in_channels=256, out_channels=256, bit_width=bit_width) 
        )
        self.pool4 = nn.MaxPool2d(kernel_size=2)

        self.enc5 = nn.Sequential(
            Conv2d(in_channels=256, out_channels=512, bit_width=bit_width),
            Conv2d(in_channels=512, out_channels=512, bit_width=bit_width)
        )
        self.pool5 = nn.MaxPool2d(kernel_size=2)

        # Bottleneck (central) layer
        self.bottleneck = nn.Sequential(
            Conv2d(in_channels=512, out_channels=1024, bit_width=bit_width), 
            Conv2d(in_channels=1024, out_channels=1024, bit_width=bit_width) 
        )

        # Upsampling path with CustomUpsample and decoder layers
        self.up5 = CustomUpsample(in_channels=1024, out_channels=512, scale_factor=2)
        self.dec5 = nn.Sequential(
            Conv2d(in_channels=1024, out_channels=512, bit_width=bit_width), 
            Conv2d(in_channels=512, out_channels=512, bit_width=bit_width)  
        )

        self.up4 = CustomUpsample(in_channels=512, out_channels=256, scale_factor=2)
        self.dec4 = nn.Sequential(
            Conv2d(in_channels=512, out_channels=256, bit_width=bit_width), 
            Conv2d(in_channels=256, out_channels=256, bit_width=bit_width)  
        )

        self.up3 = CustomUpsample(in_channels=256, out_channels=128, scale_factor=2)
        self.dec3 = nn.Sequential(
            Conv2d(in_channels=256, out_channels=128, bit_width=bit_width), 
            Conv2d(in_channels=128, out_channels=128, bit_width=bit_width)  
        )

        self.up2 = CustomUpsample(in_channels=128, out_channels=64, scale_factor=2)
        self.dec2 = nn.Sequential(
            Conv2d(in_channels=128, out_channels=64, bit_width=bit_width),  
            Conv2d(in_channels=64, out_channels=64, bit_width=bit_width)
        )

        self.up1 = CustomUpsample(in_channels=64, out_channels=32, scale_factor=2)
        self.dec1 = nn.Sequential(
            Conv2d(in_channels=64, out_channels=32, bit_width=bit_width), 
            Conv2d(in_channels=32, out_channels=32, bit_width=bit_width) 
        )

        # Final convolutional layer for output
        self.final = qnn.QuantConv2d(
            in_channels=32, out_channels=1, kernel_size=1, stride=1, padding=0,
            weight_quant=Int8WeightPerTensorFloat, act_quant=Int8ActPerTensorFloat)

        # Quantized identity layers for input, output, and encoder activations
        self.quant_inp = qnn.QuantIdentity(act_quant=Int8ActPerTensorFloat, bit_width=bit_width)
        self.quant_out = qnn.QuantIdentity(act_quant=Int8ActPerTensorFloat, bit_width=bit_width)
        self.quant_enc1 = qnn.QuantIdentity(act_quant=Int8ActPerTensorFloat, bit_width=bit_width)
        self.quant_enc2 = qnn.QuantIdentity(act_quant=Int8ActPerTensorFloat, bit_width=bit_width)
        self.quant_enc3 = qnn.QuantIdentity(act_quant=Int8ActPerTensorFloat, bit_width=bit_width)
        self.quant_enc4 = qnn.QuantIdentity(act_quant=Int8ActPerTensorFloat, bit_width=bit_width)
        self.quant_enc5 = qnn.QuantIdentity(act_quant=Int8ActPerTensorFloat, bit_width=bit_width)

    def forward(self, x):
        # Forward pass through the network
        x = self.quant_inp(x)
        enc1 = self.enc1(x)
        enc2 = self.enc2(self.pool1(enc1))
        enc3 = self.enc3(self.pool2(enc2))
        enc4 = self.enc4(self.pool3(enc3))
        enc5 = self.enc5(self.pool4(enc4))

        bottleneck = self.bottleneck(self.pool5(enc5))

        up5 = self.up5(bottleneck)
        enc5 = self.quant_enc5(enc5)
        up5 = self.quant_enc5(up5)
        dec5 = self.dec5(torch.cat((up5, enc5), dim=1))

        up4 = self.up4(dec5)
        enc4 = self.quant_enc4(enc4)
        up4 = self.quant_enc4(up4)
        dec4 = self.dec4(torch.cat((up4, enc4), dim=1))

        up3 = self.up3(dec4)
        enc3 = self.quant_enc3(enc3)
        up3 = self.quant_enc3(up3)
        dec3 = self.dec3(torch.cat((up3, enc3), dim=1))

        up2 = self.up2(dec3)
        enc2 = self.quant_enc2(enc2)
        up2 = self.quant_enc2(up2)
        dec2 = self.dec2(torch.cat((up2, enc2), dim=1))

        up1 = self.up1(dec2)
        enc1 = self.quant_enc1(enc1)
        up1 = self.quant_enc1(up1)
        dec1 = self.dec1(torch.cat((up1, enc1), dim=1))

        output = torch.sigmoid(self.final(dec1))
        return self.quant_out(output)

I am using the following versions

python 3.10.12
concrete-ml 1.6.1
concrete-python 2.7.0

And my hardware is

Screenshot 2024-07-24 at 14 53 12

Is it actually a bug or is there a way how I can fix that?

Thank you very much!

RomanBredehoft commented 1 month ago

Hello @ganyuancao , These errors are not easy to debug without being able to reproduce them. Could you therefore :

Thanks for the report !

ganyuancao commented 1 month ago

Hello @ganyuancao , These errors are not easy to debug without being able to reproduce them. Could you therefore :

  • share a reproducible code (in particular, the parameters you use for the compilation and so on. For the inputset, it could be fake data with the same ranges, as long as you have the same issue and we can run it easily !
  • share the complete traceback, as we believe it is incomplete. For example, you might have some additional error lines right above the Stack dump without symbol names message

Thanks for the report !

Hi @RomanBredehoft ,

Thank you for the update!

Yes I have already included the parameters for the compilation etc...

model = compile_brevitas_qat_model(model, cali_image, n_bits=n_bits, rounding_threshold_bits={'n_bits': n_bits, 'method': 'approximate'}, p_error=1e-5)
model.fhe_circuit.keygen(seed=key_seed, encryption_seed=enc_seed)
input_q = model.quantize_input(input)
output_q = model.quantized_forward(input_q, fhe='execute')
output = model.dequantize_output(output_q)

With the model, I guess you should be able to reproduce it with random data of shape torch.Size([1, 1, 256, 256]).

For the traceback, there is actually no additional error lines above the Stack dump without symbol names. This message is right below the output of my program like that

Running Inference in FHE on Sample 1 ...
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  libLLVM-17git-c7b3ee8f.so 0x00007a94d9152b51 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 225
1  libLLVM-17git-c7b3ee8f.so 0x00007a94d9150564
2  libc.so.6                 0x00007a95b4c42520
3  sharedlib.so              0x00007a948b605390 concrete_main + 576
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Segmentation fault (core dumped)

Thank you!

RomanBredehoft commented 1 month ago

Thank you for your quick answer @ganyuancaom !

I just tried to reproduce your error by using :

key_seed = 0
enc_seed = 0

model = UNet(bit_width=n_bits)

cali_image = torch.randn((1,1,256,256))

but got a different errors when compiling :

could you then specify which configuration you are using so that we better understand what triggers yours ? Thanks !

bcm-at-zama commented 1 month ago

Ideally, you would send a single.py that contains everything such that we launch and reproduce directly!

ganyuancao commented 1 month ago

Thank you for your quick answer @ganyuancaom !

I just tried to reproduce your error by using :

key_seed = 0
enc_seed = 0

model = UNet(bit_width=n_bits)

cali_image = torch.randn((1,1,256,256))

but got a different errors when compiling :

  • n_bits=8 and n_bits=7 gives RuntimeError: Function you are trying to compile cannot be compiled
  • n_bits=6 and below gives Could not determine a unique scale for the quantization! Please check the ONNX graph of this model., which looks directly related to your CustomUpsample module

could you then specify which configuration you are using so that we better understand what triggers yours ? Thanks !

Hi @RomanBredehoft ,

I am using n_bits=7 and it can actually be compiled and run in simulation mode on my side. I sent you a message at FHE discord. Can you maybe check that out?

Thanks!

RomanBredehoft commented 1 month ago

Yes I got your code and will check when I can, thanks !

RomanBredehoft commented 1 month ago

Hello again ! We took a look at your issue problem and this is a memory issue : at run time, the compiler tries to allocate up to 600 GB of memory, and thus fails 😅

This is somewhat expected, your model is pretty significant and thus requires a lot of computations from Concrete. Here are different alternatives (other than using a bigger machine) :

With that being said, your use case started an internal discussion about potential optimization on memory allocation so you might expect some improvements in the future ! We are also planning on adding a dedicated and explicit error message for issues like you got.

So again, thanks for the report and I hope that you'll be able to find proper solutions to your use case !