Closed codenia closed 11 months ago
Hi @codenia.
I'm glad to hear you've had some success with deploying this model to the iPhone 15 Pro. I haven't spent any time on memory optimization, because I'm using this model in a Mac app where memory usage is not an issue. I haven't tested this model on iOS.
The model creation script in this repo generates a model that uses 32-bit floating point types for computation. You can change this to 16-bit, which will halve the model file size. I just tried this on my Mac. It runs a little faster, and takes less memory. It may work better for you. However, I did notice that the 16-bit model sometimes output some peculiar image artifacts in my test app (which I haven't published). Give it a try and see how it works for you. All you have to do is change FLOAT32
in this line
to FLOAT16
and rerun the script. I also suggest you change CPU_AND_GPU
in this line
to ALL
. This will give you the option to run the model on the Neural Engine, which is Apple's preferred device anyway. The NE cannot run models with a 32-bit floating point precision.
There's also a whole raft of optimization techniques that can be applied when converting the model to Core ML format. Apple documents these options at https://apple.github.io/coremltools/docs-guides/source/optimizing-models.html. I haven't tried any of these.
BTW, you might try fiddling with the mask size, too. The 800x800 px size is arbitrary and an upper bound on the size of the mask. You can, of course, try making it larger to support larger masks. I wonder what kind of impact making it smaller would have? I haven't tried that.
Also, if you want to fiddle with the conversion script on your own, I recommend installing Miniforge. That will install the conda
command. From there, you can pick up directly from the conversion instructions in the README.
Let me know how this works out for you.
Thank you for the information. This model is very suitable for macOS.
I have now converted the model with FLOAT16 and tested it under iOS. Surprisingly, it uses even more memory. Immediately after loading the model, the memory increases by 1.6GB. 200MB more than with FLOAT32.
It should be possible to simplify the model somehow in order to reduce the memory usage.
I have now converted the model with FLOAT16 and tested it under iOS. Surprisingly, it uses even more memory. Immediately after loading the model, the memory increases by 1.6GB. 200MB more than with FLOAT32.
That's strange. When I tested it on macOS, the FLOAT16 model did in fact take less memory. Hmmm... I will take some time to look into this.
It should be possible to simplify the model somehow in order to reduce the memory usage.
I'm not sure what you mean by this. You mean changing the model itself?
That's strange. When I tested it on macOS, the FLOAT16 model did in fact take less memory. Hmmm... I will take some time to look into this.
Yes, it is strange. The model with FLOAT32 has a file size of 205MB and the model with FLOAT16 has a file size of 103MB. But after loading the model, the model with FLOAT32 uses about 1.4GB RAM and the model with FLOAT16 uses about 1.6GB RAM.
I'm not sure what you mean by this. You mean changing the model itself?
No, what I mean is trying to optimize this model or reduce memory consumption in some other way. I am not yet very familiar with AI models in general and CoreML. I tried "sparsify_weights" but it didn't reduce memory usage.
I have now converted the model with 512x512 pixels and the model only uses 408MB when loading. We can work very well with this now. Thank you again for your script.
size = (512, 512) # pixel width x height
I have now converted the model with 512x512 pixels and the model only uses 408MB when loading. We can work very well with this now. Thank you again for your script.
size = (512, 512) # pixel width x height
That's a good point. Maybe I'll add some tips on model memory optimization.
Have you tried running the model with the Neural Engine? I'm curious what your experience with that is.
Have you tried running the model with the Neural Engine? I'm curious what your experience with that is.
I haven't tested it because the file size with Neural Network is 204MB and the file of the FLOAT16 mlpackage is 103MB. I will use mlpackage because of the smaller file size.
Have you tried running the model with the Neural Engine? I'm curious what your experience with that is.
I haven't tested it because the file size with Neural Network is 204MB and the file of the FLOAT16 mlpackage is 103MB. I will use mlpackage because of the smaller file size.
I should clarify. I'm referring to the compute units you configure the model to run with. You can run a FLOAT16 mlpackage on the Neural Engine. For example, this code snippet would configure an FP16 LaMa
model to run on the Neural Engine, falling back to the CPU where necessary:
let modelConfiguration = MLModelConfiguration()
modelConfiguration.computeUnits = .cpuAndNeuralEngine
let lama = try await LaMa.load(configuration: modelConfiguration)
I'm not sure what the default computeUnits
setting is...
I thought that I need to convert it to Neural Networks instead of an ML program (mlpackage) to use the cpuAndNeuralEngine setting. And if I convert it to Neural Networks, the .mlmodel filesize is 204MB.
OK. I just converted the model with these parameters:
compute_precision=ct.precision.FLOAT16,
compute_units=ct.ComputeUnit.ALL,
Then I tested both settings, cpuAndNeuralEngine and cpuAndGPU one after the other. With cpuAndNeuralEngine the app uses 250MB more memory and I didn't notice any performance or quality improvements.
Thank you for publishing this script and a test project for macOS in Swift.
I wanted to test it on iOS. Someone created the model LaMa.mlmodelc for me using this script. The file size of the model is 205.5MB. This is not critical so far. Now I have created a working project in Objective C for iOS. It works great with a new iPhone 15 Pro Max. The model expects a fixed size of 800x800 pixels. You can work with this and achieve very good inpainting results.
The problem is the very high memory usage immediately after loading the model in the iOS app at runtime. After this line of code, the app immediately uses 1.4GB more memory than before:
MLModel *model = [MLModel modelWithContentsOfURL:[[NSBundle mainBundle] URLForResource:@"LaMa" withExtension:@"mlmodelc"] error:&error];
The actual processing of the image inpainting then requires approx. 400MB of memory, which is not too bad.
The problem is the 1.4GB memory consumption when loading the model. The older iPad and iPhone crash at this point with "Out Of Memory" (OOM) exception.
Is there a way to reduce the size of the model? Maybe a simplification or compression of the model?