zama-ai / concrete-ml

Concrete ML: Privacy Preserving ML framework using Fully Homomorphic Encryption (FHE), built on top of Concrete, with bindings to traditional ML frameworks.
Other
945 stars 137 forks source link

Is there a way to serialize/save the q_model generated by compile_torch_model function? #885

Open nazarpysko opened 2 weeks ago

nazarpysko commented 2 weeks ago

I'm messing around with Concrete ML these days and I was wondering if this is possible. Furthermore, after compiling the FHE model equivalent, what should be the size of the test sample to run it in simulation? Until now, even when using the whole test set, this was not a problem as the total simulation time was not that large (less than half an hour). However, now I'm working with more complex models (simplified VGG, simplified MobileNet, etc) and this process is way longer on my computer (almost 2 hours). Is it enough to use only a portion of the original test set (maybe a quarter part)?

Moreover, for hyperparameter optimization the FHE circuit, I've seen in some of your notebooks that you try to compile different FHE models varying the rounding_threshold_bits and then compare the simulated accuracy obtained. What about trying different p_errors too? I suppose that you only did try rounding_threshold_bits for simplification but I understand that trying p_errrors is okay. Also, the only thing left is to measure the inference time after compiling the FHE circuit with the chosen hyperparameters in order to see how they affect the FHE inference time. I understand that each of these combinations offers different trade-offs regarding accuracy/time performance. Am I right or did I mess up something?

Thanks for your amazing job 👍

jfrery commented 2 weeks ago

Hi @nazarpysko,

what should be the size of the test sample to run it in simulation?

Well test set is really up to you to decide... Simulation should be pretty fast compared to FHE but is much slower than torch because it does not support batch inference. So if you have batches of 128 it will be at least 128 slower.

If you need to do multiple test, you can probably sample a few examples from you test and compute the metrics. But again, that's really up to you.

If you meant compilation set, then a small representative subset is good enough yes.

About you second question, you are right, the two main hyper parameters you need to check are p_error and rounding_threshold_bits. And then yes, you can compile and run in simulation to check both accuracy and speed.

We found that, 0.01>p_error>0.1 often give a great speed with not much improvement above that (unless your model can take in p_error=0.999... or something but that's unlikely). So we only check the rounding threshold.

As for the issue title (which I didn't find anything related in you comment) yes you should be able to serialize the q_module by calling the dump() method.

nazarpysko commented 4 hours ago

Sorry for reopening the issue, but I see convenient to reuse it as I have a related question.

What is the most appropriate approach to compare the model size between a plaintext custom NN using PyTorch and its ciphertext ConcreteML counterpart? Is it appropriate to compare the .pth file size and the file size resulting from dumping the q_module generated from the compile_torch_model function? I just need an appropiate way to analyze the impact of applying FHE with ConcreteML regarding the model size.

Maybe a more appropiate way to analyze this impact is to use something like the following:

def get_torch_model_size(model):
    param_size = 0
    for param in model.parameters():
        param_size += param.nelement() * param.element_size()
    buffer_size = 0
    for buffer in model.buffers():
        buffer_size += buffer.nelement() * buffer.element_size()
    size_all_mb = (param_size + buffer_size) / 1024**2
    return size_all_mb

def get_concrete_model_size(q_module):
    param_size = 0
    for layer in q_module.quantized_module:
        if hasattr(layer, 'weight'):
            param_size += layer.weight.size().numel() * layer.weight.element_size()
        if hasattr(layer, 'bias') and layer.bias is not None:
            param_size += layer.bias.size().numel() * layer.bias.element_size()
    size_all_mb = param_size / 1024**2
    return size_all_mb
jfrery commented 3 hours ago

The quantized module is just an intermediate representation of what will be the FHE circuit eventually. I think what you want is the final size of the model that will run on encrypted data. This is not trivial to get but you can get an estimation by checking the size of the server.zip you obtain when saving the FHE components (you can check this doc on how to obtain it).

You can do a rough estimation with

$$ \text{size\_in\_MB} = \frac{\text{\#params} \text{n\_bits}}{1024 1024}$$

this is because all parameters of the model are quantized to n_bits.

That being said I wonder what you want to achieve with such comparison as it's unrelated to FHE.

If you want to see what will be the size of the encrypted value instead (parameters of the model are not encrypted). You can do

q_module.fhe_circuit.statistics

You will get a bunch of stats among which you will find input and output size.

nazarpysko commented 2 hours ago

It is appropiate to compare the server.zip size to the .pth file containining the model weights? I just want to analyze the impact of HE regarding the model size. More specifically, I try to get this insights by using ConcreteML on some base models (custom NN using PyTorch) using PTQ with the compile_torch_model function. As my understanding reach, HE typically increases the size of the model.

I will check the output of q_module.fhe_circuit.statistics by the way. Thank you.