okuvshynov / slowllama

Finetune llama2-70b and codellama on MacBook Air without quantization
MIT License
431 stars 33 forks source link

finetune.py segmentation fault #10

Closed QueryType closed 8 months ago

QueryType commented 8 months ago

I am trying to run the finetune.py and getting a seg. fault. Can anyone help. I am on Apple M2 mac mini with 24G memory.

% python finetune.py 
loc("mps_transpose"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":206:0)): error: 'anec.transpose' op Invalid configuration for the following reasons: Tensor dimensions N1D1C4096H1W32000 are not within supported range, N[1-65536]D[1-16384]C[1-65536]H[1-16384]W[1-16384].
loc("mps_matmul"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/75428952-3aa4-11ee-8b65-46d450270006/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":39:0)): error: 'anec.matmul' op Invalid configuration for the following reasons: Tensor dimensions N1D1C4096H1W32000 are not within supported range, N[1-65536]D[1-16384]C[1-65536]H[1-16384]W[1-16384].
zsh: segmentation fault  python finetune.py
okuvshynov commented 8 months ago
  1. Which version of pytorch do you use? I saw some issues with pytorch 2.0.1 (https://github.com/pytorch/pytorch/issues/110975)
  2. did you update the slowllama repo recently?

Thank you!

QueryType commented 8 months ago

Thanks. In that case I should upgrade?

  1. Yes.

    % pip freeze | egrep -i 'torch|numpy|sentence|fewlines'
    fewlines==0.0.9
    numpy==1.25.2
    sentence-transformers==2.2.2
    sentencepiece==0.1.99
    torch==2.0.1
    torchvision==0.15.2
  2. Just couple of hours ago.

okuvshynov commented 8 months ago

Yes, try 2.1.0 please. Also please make sure you ran prepare_model and finetune at the same version of slowllama - as it is pretty early/experimental there's no backwards compatibility.

QueryType commented 8 months ago

Thanks, I did and it works now! This is super. Great work! Some points I noted:

  1. Not all snapshots were written to disk. Some were skipped. I will re run and check.
  2. GPU usage during finetune was optimum. However during inferencing, it is using fairly low, <10%. Is this expected? Will try more things in coming days. Thanks again.
okuvshynov commented 8 months ago
  1. The logic to save is to only save if loss is lower. We can change that (https://github.com/okuvshynov/slowllama/blob/main/finetune.py#L53-L58).
  2. Yes, I put close to no effort to inference optimization. I think there are other libraries focusing on that specifically (e.g. llama.cpp)
QueryType commented 8 months ago

Thanks for the clarifications, I will dive into the code too. Though I am not an expert at this but I will give it my best shot. llama.cpp works great for me, however I am unable to get the finetune to work on it on my mac.