zama-ai / concrete

Concrete: TFHE Compiler that converts python programs into FHE equivalent
Other
1.17k stars 136 forks source link

Compilation failing with "RuntimeError: Cannot open compilation feedback" #741

Open LenaMartens opened 4 months ago

LenaMartens commented 4 months ago

Summary

See the attached stack-trace below: compilation fails at the point of collecting compilation feedback with a message of [6:21, byte=99]: Expected object key, and dumps the compilation-feedback (which looks like json).

I looked up the error message in LLVM, and it's a json format error. If I take the json produced by the error message and run it through a json validation tool it gives me an error at the same line and byte.

Specifically, these two lines are problematic:

"globalPError": 1,5327091463575139e-11,
....
"pError": 8,3017851936157935e-13,

The numbers contain a comma. If I replace the comma with a dot, it parses correctly.

I'm using compile_brevitas_qat_model from concrete-ml to compile a quantized Brevitas model to trigger this. I think the bug is in the concrete compiler, but please tell me if you'd rather I open this bug in concrete-ml instead. I haven't been able to create a minimal reproducer but was hoping this is enough information to fix the issue. Let me know if I should provide a minimal repro or more info!

Description

Full stack-trace

``` Traceback (most recent call last): File "~/fhe/train.py", line 183, in main model = compile_brevitas_qat_model(model, ds.X) File "~/.local/lib/python3.10/site-packages/concrete/ml/torch/compile.py", line 520, in compile_brevitas_qat_model q_module = compile_onnx_model( File "~/.local/lib/python3.10/site-packages/concrete/ml/torch/compile.py", line 371, in compile_onnx_model return _compile_torch_or_onnx_model( File "~/.local/lib/python3.10/site-packages/concrete/ml/torch/compile.py", line 214, in _compile_torch_or_onnx_model quantized_module.compile( File "~/.local/lib/python3.10/site-packages/concrete/ml/quantization/quantized_module.py", line 722, in compile self.fhe_circuit = compiler.compile( File "~/.local/lib/python3.10/site-packages/concrete/fhe/compilation/compiler.py", line 577, in compile circuit = Circuit( File "~/.local/lib/python3.10/site-packages/concrete/fhe/compilation/circuit.py", line 63, in __init__ self.enable_fhe_execution() File "~/.local/lib/python3.10/site-packages/concrete/fhe/compilation/circuit.py", line 129, in enable_fhe_execution self.server = Server.create( File "~/.local/lib/python3.10/site-packages/concrete/fhe/compilation/server.py", line 210, in create result = Server( File "~/.local/lib/python3.10/site-packages/concrete/fhe/compilation/server.py", line 83, in __init__ self._compilation_feedback = self._support.load_compilation_feedback(compilation_result) File "~/.local/lib/python3.10/site-packages/concrete/compiler/library_support.py", line 237, in load_compilation_feedback self.cpp().load_compilation_feedback(compilation_result.cpp()) RuntimeError: Cannot open compilation feedback: [6:21, byte=99]: Expected object key { "complexity": 129992601720, "crtDecompositionsOfOutputs": [ [] ], "globalPError": 1,5327091463575139e-11, "memoryUsagePerLoc": { "loc(\"~/.local/lib/python3.10/site-packages/concrete/ml/quantization/quantizers.py\":773:0)": 15282528, "loc(\"@/fc1/Gemm.matmul | ~/.local/lib/python3.10/site-packages/concrete/ml/quantization/quantized_ops.py\":381:0)": 4194472, "loc(\"@/fc2/Gemm.matmul | ~/.local/lib/python3.10/site-packages/concrete/ml/quantization/quantized_ops.py\":381:0)": 8389064, "loc(\"@/fc3/Gemm.matmul | ~/.local/lib/python3.10/site-packages/concrete/ml/quantization/quantized_ops.py\":381:0)": 12583124, "loc(unknown)": 1048592 }, "pError": 8,3017851936157935e-13, "statistics": [ { "count": 20, "keys": [ [ "SECRET", 0 ] ], "location": "loc(\"@/fc1/Gemm.matmul | ~/.local/lib/python3.10/site-packages/concrete/ml/quantization/quantized_ops.py\":381:0)", "operation": "CLEAR_MULTIPLICATION" }, { "count": 20, "keys": [ [ "SECRET", 0 ] ], "location": "loc(\"@/fc1/Gemm.matmul | ~/.local/lib/python3.10/site-packages/concrete/ml/quantization/quantized_ops.py\":381:0)", "operation": "ENCRYPTED_ADDITION" }, { "count": 10, "keys": [ [ "SECRET", 0 ] ], "location": "loc(\"~/.local/lib/python3.10/site-packages/concrete/ml/quantization/quantizers.py\":773:0)", "operation": "CLEAR_ADDITION" }, { "count": 10, "keys": [ [ "KEY_SWITCH", 0 ] ], "location": "loc(\"~/.local/lib/python3.10/site-packages/concrete/ml/quantization/quantizers.py\":773:0)", "operation": "KEY_SWITCH" }, { "count": 10, "keys": [ [ "BOOTSTRAP", 0 ] ], "location": "loc(\"~/.local/lib/python3.10/site-packages/concrete/ml/quantization/quantizers.py\":773:0)", "operation": "PBS" }, { "count": 100, "keys": [ [ "SECRET", 0 ] ], "location": "loc(\"@/fc2/Gemm.matmul | ~/.local/lib/python3.10/site-packages/concrete/ml/quantization/quantized_ops.py\":381:0)", "operation": "CLEAR_MULTIPLICATION" }, { "count": 100, "keys": [ [ "SECRET", 0 ] ], "location": "loc(\"@/fc2/Gemm.matmul | ~/.local/lib/python3.10/site-packages/concrete/ml/quantization/quantized_ops.py\":381:0)", "operation": "ENCRYPTED_ADDITION" }, { "count": 10, "keys": [ [ "SECRET", 0 ] ], "location": "loc(\"~/.local/lib/python3.10/site-packages/concrete/ml/quantization/quantizers.py\":773:0)", "operation": "CLEAR_ADDITION" }, { "count": 10, "keys": [ [ "KEY_SWITCH", 1 ] ], "location": "loc(\"~/.local/lib/python3.10/site-packages/concrete/ml/quantization/quantizers.py\":773:0)", "operation": "KEY_SWITCH" }, { "count": 10, "keys": [ [ "BOOTSTRAP", 1 ] ], "location": "loc(\"~/.local/lib/python3.10/site-packages/concrete/ml/quantization/quantizers.py\":773:0)", "operation": "PBS" }, { "count": 20, "keys": [ [ "SECRET", 2 ] ], "location": "loc(\"@/fc3/Gemm.matmul | ~/.local/lib/python3.10/site-packages/concrete/ml/quantization/quantized_ops.py\":381:0)", "operation": "CLEAR_MULTIPLICATION" }, { "count": 20, "keys": [ [ "SECRET", 2 ] ], "location": "loc(\"@/fc3/Gemm.matmul | ~/.local/lib/python3.10/site-packages/concrete/ml/quantization/quantized_ops.py\":381:0)", "operation": "ENCRYPTED_ADDITION" } ], "totalBootstrapKeysSize": 0, "totalInputsSize": 524304, "totalKeyswitchKeysSize": 0, "totalOutputsSize": 1048592, "totalSecretKeysSize": 802720 } ```

BourgerieQuentin commented 3 months ago

Do you have a minimal test to reproduce from what I tested I never had this error and pError are well formatted with dot.

LenaMartens commented 3 months ago

I have found a minimal repro, but it requires some complex interactions to trigger the bug. I can work around it so it's not urgent to fix (and it seems quite obscure), I'll just share what I've learned for your information because it's interesting :-)

Summary: If your system defaults to a locale that uses commas instead of dots to format decimal numbers, and you use any matplotlib.pyplot function, this sets your locale to your system's default and changes the way C formats your decimal numbers.

So, the error only occurs when I use matplotlib.pyplot. I found some bugs online that reference that this resets your locale as a side-effect (?!) It seems pretty old behavior (see this blog from 2006) so not sure if this will ever be fixed.

Here's a minimal file which reproduces the error on my machine. This is because my default locale is Spanish. From cat /etc/default/locale: LC_NUMERIC="es_ES.UTF-8".

from matplotlib import pyplot as plt
from concrete.ml.torch.compile import compile_torch_model
import locale
import torch

class NN(torch.nn.Module):
    def __call__(self, x):
        return x+1

print(locale.getlocale(locale.LC_NUMERIC))  # (None, None)
input_set = torch.ones(1, 1)
_ = compile_torch_model(NN(), input_set)     # succeeds
print(locale.getlocale(locale.LC_NUMERIC))  # (None, None)
plt.figure()
print(locale.getlocale(locale.LC_NUMERIC))  # ('es_ES', 'UTF-8')
_ = compile_torch_model(NN(), input_set)     # fails

I don't know if this is worth fixing, but I guess the compilation feedback could be made agnostic to the system's locale setting.

BourgerieQuentin commented 3 months ago

Awesome investigation, thanks @LenaMartens . As well you are totally right json serialization should be agnostics to the systems's locale, we use llvm::json library for this CompilationFeedback object serialization so I guess the bug is in this library.