Open adonishong opened 1 year ago
👋 Hey!
The 2.8b in the GitHub release is not directly the result you get by running convert.py
. For larger models you need to do two more steps:
python -m src.experiments.chunk_model --mlpackage-path pythia-1.4b_2023_04_11-20_54_12.mlpackage -o .
_chunk{1,2,3,...}.mlpackage
._chunk{1,2,3...}.mlpackage
files.
python -m src.experiments.make_pipeline pythia-1.4b_2023_04_11-20_54_12_chunk1.mlpackage
Show Package Contents
and seeing that the Data > com.apple.CoreML > weights
folder has many files (one per chunk).
That should allow you to recreate the 2.8b model that runs on ANE. Two more things that might be helpful:
I'm not sure how you are checking to see if the model runs on the ANE, but I would recommend using the --wait
flag and attaching the CoreML tool from Instruments. Xcode really struggles with these larger models.
python generate.py --model_path gpt2-medium.mlmodelc --compute_unit CPUAndANE --wait
For the chunked models you should see one "Neural Engine Prediction" block for each chunk of the model -- it will be obvious if some chunks run on ANE and some do not. (This screenshot is not a chunked model.) There will be a tiny gap between each block that runs on CPU, but it should be very small.
I only have an M1, but I think there is a chance you can get the 6.9b running on the M2's ANE. You will definitely need to use the chunk_model
and make_pipeline
tools. I would start with 670 for the chunk size (like 2.8b) and try smaller if that doesn't work. Let me know if you try, I'd be happy to help try and figure out how to get it working!
Sorry for the slow response and also that all of this is missing from the documentation.
Appreciate for your guys work.
My testing machine is M2 Max 64GB memory.
With generate.py script, the Pythia 2.8b mlpackage from GitHub release will call ANE no matter with --compute_unit="All" or --compute_unit="CPUAndANE". However, if I try to convert Pythia 2.8b from convert.py, the mlpackage will not call ANE, with --compute_unit="All", CPU and GPU will be used; with --compute_unit="CPUAndANE", only CPU will be called. Pythia-410m shows different case, both mlpackage download from GitHub release and converted from convert.py script could call ANE.
BTW, Pythia-6.9b could be converted from convert.py script, and with generate.py and --compute_unit="CPUAndGPU", it works well, but it will not call ANE also.