togethercomputer / OpenChatKit

Apache License 2.0
9k stars 1.01k forks source link

Run OpenChatKit-cpu_support project meet one issue #94

Closed yxy123 closed 1 year ago

yxy123 commented 1 year ago

Run "bash training/finetune_GPT-NeoXT-Chat-Base-20B.sh" in OpenChatKit-cpu_support seems still need CUDA, the detail error log as below : ImportError:

Failed to import CuPy.

If you installed CuPy via wheels (cupy-cudaXXX or cupy-rocm-X-X), make sure that the package matches with the version of CUDA or ROCm installed.

On Linux, you may need to set LD_LIBRARY_PATH environment variable depending on how you installed CUDA/ROCm. On Windows, try setting CUDA_PATH environment variable.

Check the Installation Guide for details: https://docs.cupy.dev/en/latest/install.html

Original error: ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

orangetin commented 1 year ago

It's hard to tell what's causing the error without all the details but it looks like you either don't have Cupy installed properly or you don't have the necessary CUDA drivers.

Could you describe your setup?

Are you running this on WSL? Because I've seen this error before and I was able to fix it by installing the proper drivers.

yxy123 commented 1 year ago

Hi, I‘m running on Linux. Cupy installation failure. Conda version: conda 4.10.1 [base]# lspci | grep VGA 01:00.1 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200EH

orangetin commented 1 year ago

Did you follow these instructions: https://github.com/togethercomputer/OpenChatKit#requirements for installing Cupy and the other dependencies before trying to train the model?

You mentioned CPU support in the first comment; CPU-only can work for inference but not for training. The command you're running is for training the model, is that what you're trying to do?

yxy123 commented 1 year ago

Yes, I use CPU-only do trainning, you mean cpu-only couldn't support training, right?

orangetin commented 1 year ago

Yes, I use CPU-only do trainning, you mean cpu-only couldn't support training, right?

No, OCK does not support fine tuning on CPUs. Adding support seems unnecessary too because it would take a very, very long time to fine tune a model on just a CPU.

You need GPUs to finetune the model. These are the requirements: (Source]

Model Inference GPU memory Fine-tuning GPU memory
GPT-NeoXT-Chat-Base-20B 42 GB 640 GB
GPT-NeoXT-Chat-Base-20B-int8 21 GB N/A
Pythia-Chat-Base-7B-v0.16 18 GB 256 GB
Pythia-Chat-Base-7B-v0.16-int8 9 GB N/A
yxy123 commented 1 year ago

Got it, thanks for your support.