finetune.py segmentation fault

QueryType commented 8 months ago

I am trying to run the finetune.py and getting a seg. fault. Can anyone help. I am on Apple M2 mac mini with 24G memory.

% python finetune.py 
loc("mps_transpose"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":206:0)): error: 'anec.transpose' op Invalid configuration for the following reasons: Tensor dimensions N1D1C4096H1W32000 are not within supported range, N[1-65536]D[1-16384]C[1-65536]H[1-16384]W[1-16384].
loc("mps_matmul"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":39:0)): error: 'anec.matmul' op Invalid configuration for the following reasons: Tensor dimensions N1D1C4096H1W32000 are not within supported range, N[1-65536]D[1-16384]C[1-65536]H[1-16384]W[1-16384].
zsh: segmentation fault  python finetune.py

okuvshynov commented 8 months ago

Which version of pytorch do you use? I saw some issues with pytorch 2.0.1 (https://github.com/pytorch/pytorch/issues/110975)
did you update the slowllama repo recently?

Thank you!

QueryType commented 8 months ago

Thanks. In that case I should upgrade?

Yes.

% pip freeze | egrep -i 'torch|numpy|sentence|fewlines'
fewlines==0.0.9
numpy==1.25.2
sentence-transformers==2.2.2
sentencepiece==0.1.99
torch==2.0.1
torchvision==0.15.2

Just couple of hours ago.

okuvshynov commented 8 months ago

Yes, try 2.1.0 please. Also please make sure you ran prepare_model and finetune at the same version of slowllama - as it is pretty early/experimental there's no backwards compatibility.

QueryType commented 8 months ago

Thanks, I did and it works now! This is super. Great work! Some points I noted:

Not all snapshots were written to disk. Some were skipped. I will re run and check.
GPU usage during finetune was optimum. However during inferencing, it is using fairly low, <10%. Is this expected? Will try more things in coming days. Thanks again.

okuvshynov commented 8 months ago

The logic to save is to only save if loss is lower. We can change that (https://github.com/okuvshynov/slowllama/blob/main/finetune.py#L53-L58).
Yes, I put close to no effort to inference optimization. I think there are other libraries focusing on that specifically (e.g. llama.cpp)

QueryType commented 8 months ago

Thanks for the clarifications, I will dive into the code too. Though I am not an expert at this but I will give it my best shot. llama.cpp works great for me, however I am unable to get the finetune to work on it on my mac.

okuvshynov / slowllama

finetune.py segmentation fault #10