mallman / CoreMLaMa

LaMa for Core ML
Apache License 2.0
85 stars 11 forks source link

High memory consumption/usage when loading the model in iOS. #1

Closed codenia closed 11 months ago

codenia commented 1 year ago

Thank you for publishing this script and a test project for macOS in Swift.

I wanted to test it on iOS. Someone created the model LaMa.mlmodelc for me using this script. The file size of the model is 205.5MB. This is not critical so far. Now I have created a working project in Objective C for iOS. It works great with a new iPhone 15 Pro Max. The model expects a fixed size of 800x800 pixels. You can work with this and achieve very good inpainting results.

The problem is the very high memory usage immediately after loading the model in the iOS app at runtime. After this line of code, the app immediately uses 1.4GB more memory than before:

MLModel *model = [MLModel modelWithContentsOfURL:[[NSBundle mainBundle] URLForResource:@"LaMa" withExtension:@"mlmodelc"] error:&error];

The actual processing of the image inpainting then requires approx. 400MB of memory, which is not too bad.

The problem is the 1.4GB memory consumption when loading the model. The older iPad and iPhone crash at this point with "Out Of Memory" (OOM) exception.

Is there a way to reduce the size of the model? Maybe a simplification or compression of the model?

memory_consumption

mallman commented 1 year ago

Hi @codenia.

I'm glad to hear you've had some success with deploying this model to the iPhone 15 Pro. I haven't spent any time on memory optimization, because I'm using this model in a Mac app where memory usage is not an issue. I haven't tested this model on iOS.

The model creation script in this repo generates a model that uses 32-bit floating point types for computation. You can change this to 16-bit, which will halve the model file size. I just tried this on my Mac. It runs a little faster, and takes less memory. It may work better for you. However, I did notice that the 16-bit model sometimes output some peculiar image artifacts in my test app (which I haven't published). Give it a try and see how it works for you. All you have to do is change FLOAT32 in this line

https://github.com/mallman/CoreMLaMa/blob/80a322b7488359366e7b1d3377fc3c315eeeacdd/convert_lama.py#L31

to FLOAT16 and rerun the script. I also suggest you change CPU_AND_GPU in this line

https://github.com/mallman/CoreMLaMa/blob/80a322b7488359366e7b1d3377fc3c315eeeacdd/convert_lama.py#L32

to ALL. This will give you the option to run the model on the Neural Engine, which is Apple's preferred device anyway. The NE cannot run models with a 32-bit floating point precision.

There's also a whole raft of optimization techniques that can be applied when converting the model to Core ML format. Apple documents these options at https://apple.github.io/coremltools/docs-guides/source/optimizing-models.html. I haven't tried any of these.

BTW, you might try fiddling with the mask size, too. The 800x800 px size is arbitrary and an upper bound on the size of the mask. You can, of course, try making it larger to support larger masks. I wonder what kind of impact making it smaller would have? I haven't tried that.

Also, if you want to fiddle with the conversion script on your own, I recommend installing Miniforge. That will install the conda command. From there, you can pick up directly from the conversion instructions in the README.

Let me know how this works out for you.

codenia commented 1 year ago

Thank you for the information. This model is very suitable for macOS.

codenia commented 1 year ago

I have now converted the model with FLOAT16 and tested it under iOS. Surprisingly, it uses even more memory. Immediately after loading the model, the memory increases by 1.6GB. 200MB more than with FLOAT32.

It should be possible to simplify the model somehow in order to reduce the memory usage.

mallman commented 1 year ago

I have now converted the model with FLOAT16 and tested it under iOS. Surprisingly, it uses even more memory. Immediately after loading the model, the memory increases by 1.6GB. 200MB more than with FLOAT32.

That's strange. When I tested it on macOS, the FLOAT16 model did in fact take less memory. Hmmm... I will take some time to look into this.

It should be possible to simplify the model somehow in order to reduce the memory usage.

I'm not sure what you mean by this. You mean changing the model itself?

codenia commented 1 year ago

That's strange. When I tested it on macOS, the FLOAT16 model did in fact take less memory. Hmmm... I will take some time to look into this.

Yes, it is strange. The model with FLOAT32 has a file size of 205MB and the model with FLOAT16 has a file size of 103MB. But after loading the model, the model with FLOAT32 uses about 1.4GB RAM and the model with FLOAT16 uses about 1.6GB RAM.

I'm not sure what you mean by this. You mean changing the model itself?

No, what I mean is trying to optimize this model or reduce memory consumption in some other way. I am not yet very familiar with AI models in general and CoreML. I tried "sparsify_weights" but it didn't reduce memory usage.

codenia commented 1 year ago

I have now converted the model with 512x512 pixels and the model only uses 408MB when loading. We can work very well with this now. Thank you again for your script.

size = (512, 512) # pixel width x height

mallman commented 1 year ago

I have now converted the model with 512x512 pixels and the model only uses 408MB when loading. We can work very well with this now. Thank you again for your script.

size = (512, 512) # pixel width x height

That's a good point. Maybe I'll add some tips on model memory optimization.

Have you tried running the model with the Neural Engine? I'm curious what your experience with that is.

codenia commented 1 year ago

Have you tried running the model with the Neural Engine? I'm curious what your experience with that is.

I haven't tested it because the file size with Neural Network is 204MB and the file of the FLOAT16 mlpackage is 103MB. I will use mlpackage because of the smaller file size.

mallman commented 1 year ago

Have you tried running the model with the Neural Engine? I'm curious what your experience with that is.

I haven't tested it because the file size with Neural Network is 204MB and the file of the FLOAT16 mlpackage is 103MB. I will use mlpackage because of the smaller file size.

I should clarify. I'm referring to the compute units you configure the model to run with. You can run a FLOAT16 mlpackage on the Neural Engine. For example, this code snippet would configure an FP16 LaMa model to run on the Neural Engine, falling back to the CPU where necessary:

let modelConfiguration = MLModelConfiguration()
modelConfiguration.computeUnits = .cpuAndNeuralEngine
let lama = try await LaMa.load(configuration: modelConfiguration)

I'm not sure what the default computeUnits setting is...

codenia commented 1 year ago

I thought that I need to convert it to Neural Networks instead of an ML program (mlpackage) to use the cpuAndNeuralEngine setting. And if I convert it to Neural Networks, the .mlmodel filesize is 204MB.

codenia commented 1 year ago

OK. I just converted the model with these parameters:

    compute_precision=ct.precision.FLOAT16,
    compute_units=ct.ComputeUnit.ALL,

Then I tested both settings, cpuAndNeuralEngine and cpuAndGPU one after the other. With cpuAndNeuralEngine the app uses 250MB more memory and I didn't notice any performance or quality improvements.