paperswithcode / galai

Model API for GALACTICA
Apache License 2.0
2.68k stars 275 forks source link

AssertionError: Torch not compiled with CUDA enabled #5

Closed Naugustogi closed 1 year ago

Naugustogi commented 1 year ago

if there isn't anything special, the normal quickstart install doesn't work.

ZQ-Dev8 commented 1 year ago

I had the same issue. The quickstart seems to install the cpu only version of pytorch by default. You need the cuda-enabled version of pytorch. Use pip/conda to uninstall the version of pytorch you have, then install the cuda version using the instructions here.

Before downloading, double-check what version of cuda you have installed so you pick the right torch version. You can do this by running nvcc --version from the command line.

Good luck!

ftencaten commented 1 year ago

I have the same issue with a MacBook Pro with an AMD graphic card. I don't think installing a Cuda-enabled version of PyTorch is an option in my case.

Naugustogi commented 1 year ago

I had the same issue. The quickstart seems to install the cpu only version of pytorch by default. You need the cuda-enabled version of pytorch. Use pip/conda to uninstall the version of pytorch you have, then install the cuda version using the instructions here.

Before downloading, double-check what version of cuda you have installed so you pick the right torch version. You can do this by running nvcc --version from the command line.

Good luck!

Cuda 11.7 with gpu is already installed, i could use an anaconda environment but i don't have much experience in that, still it doesn't work,

dionator commented 1 year ago

Has anyone found a workaround to this?

ZQ-Dev8 commented 1 year ago

I had the same issue. The quickstart seems to install the cpu only version of pytorch by default. You need the cuda-enabled version of pytorch. Use pip/conda to uninstall the version of pytorch you have, then install the cuda version using the instructions here. Before downloading, double-check what version of cuda you have installed so you pick the right torch version. You can do this by running nvcc --version from the command line. Good luck!

Cuda 11.7 with gpu is already installed, i could use an anaconda environment but i don't have much experience in that, still it doesn't work,

Is the CPU-only version also installed? If so, try uninstalling it. Otherwise, it sounds like an environment issue and I would make a new conda/venv environment. Both are relatively easy to get set up, here's a good place to start.

Naugustogi commented 1 year ago

Has anyone found a workaround to this?

another attempt with the huggingface transformer worked, maybe abit complicated also had to use a cpu version of a package

mkardas commented 1 year ago

Hi @Naugustogi, can you check if you still experience the issues with galai version 1.1.0? You should be able to use the model on CPU with load_model(..., num_gpus=0).

Naugustogi commented 1 year ago

num_gpus=0)

doesn't work either

AssertionError: Torch not compiled with CUDA enabled

mkardas commented 1 year ago

@Naugustogi any chance you can provide the full stack trace?

Naugustogi commented 1 year ago

@Naugustogi any chance you can provide the full stack trace? it happened after i started the program normally with inference

import galai as gal model = gal.load_model(name = 'mini',num_gpus=0) model.generate("Scaled dot product attention:\n\n\[")


i just use the cpu version

┌─────────────────────────────── Traceback (most recent call last) ────────────────────────────────┐ │ F:\galai-1.0.0\start.py:2 in │ │ │ │ 1 import galai as gal │ │ > 2 model = gal.load_model(name = 'mini',num_gpus=0) │ │ 3 model.generate("Scaled dot product attention:\n\n\[") │ │ │ │ F:\galai-1.0.0\galai__init.py:40 │ │ in load_model │ │ │ │ 37 │ model = Model(name=name, dtype=dtype, num_gpus=num_gpus) │ │ 38 │ model._set_tokenizer(tokenizer_path=get_tokenizer_path()) │ │ 39 │ if name in ['mini', 'base']: │ │ > 40 │ │ model._load_checkpoint(checkpoint_path=get_checkpoint_path(name)) │ │ 41 │ else: │ │ 42 │ │ model._load_checkpoint(checkpoint_path=get_checkpoint_path(name)) │ │ 43 │ │ │ │ F:\galai-1.0.0\galai\model.py:63 in │ │ _load_checkpoint │ │ │ │ 60 │ │ if 'mini' in checkpoint_path or 'base' in checkpoint_path: │ │ 61 │ │ │ checkpoint_path = checkpoint_path + '/pytorch_model.bin' │ │ 62 │ │ │ │ > 63 │ │ load_checkpoint_and_dispatch( │ │ 64 │ │ │ self.model.model, │ │ 65 │ │ │ checkpoint_path, │ │ 66 │ │ │ device_map=device_map, │ │ │ │ C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\big_modeling. │ │ py:366 in load_checkpoint_and_dispatch │ │ │ │ 363 │ │ ) │ │ 364 │ if offload_state_dict is None and "disk" in device_map.values(): │ │ 365 │ │ offload_state_dict = True │ │ > 366 │ load_checkpoint_in_model( │ │ 367 │ │ model, │ │ 368 │ │ checkpoint, │ │ 369 │ │ device_map=device_map, │ │ │ │ C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\utils\modelin │ │ g.py:701 in load_checkpoint_in_model │ │ │ │ 698 │ │ │ │ │ set_module_tensor_to_device(model, param_name, "meta") │ │ 699 │ │ │ │ │ offload_weight(param, param_name, state_dict_folder, index=state_dic │ │ 700 │ │ │ │ else: │ │ > 701 │ │ │ │ │ set_module_tensor_to_device(model, param_name, param_device, value=p │ │ 702 │ │ │ │ 703 │ │ # Force Python to clean up. │ │ 704 │ │ del checkpoint │ │ │ │ C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\utils\modelin │ │ g.py:124 in set_module_tensor_to_device │ │ │ │ 121 │ │ if value is None: │ │ 122 │ │ │ new_value = old_value.to(device) │ │ 123 │ │ elif isinstance(value, torch.Tensor): │ │ > 124 │ │ │ new_value = value.to(device) │ │ 125 │ │ else: │ │ 126 │ │ │ new_value = torch.tensor(value, device=device) │ │ 127 │ │ │ │ C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\cuda\init__.py:2 │ │ 21 in _lazy_init │ │ │ │ 218 │ │ │ │ "Cannot re-initialize CUDA in forked subprocess. To use CUDA with " │ │ 219 │ │ │ │ "multiprocessing, you must use the 'spawn' start method") │ │ 220 │ │ if not hasattr(torch._C, '_cuda_getDeviceCount'): │ │ > 221 │ │ │ raise AssertionError("Torch not compiled with CUDA enabled") │ │ 222 │ │ if _cudart is None: │ │ 223 │ │ │ raise AssertionError( │ │ 224 │ │ │ │ "libcudart functions unavailable. It looks like you have a broken build? │ └──────────────────────────────────────────────────────────────────────────────────────────────────┘ AssertionError: Torch not compiled with CUDA enabled

mkardas commented 1 year ago

Thanks @Naugustogi. The traceback shows galai 1.0.0. Can you try with 1.1.2?

Naugustogi commented 1 year ago

Thanks @Naugustogi. The traceback shows galai 1.0.0. Can you try with 1.1.2?

i'm not sure where to get that, in this repo, its just version 1.0.0 (3 weeks ago)

mkardas commented 1 year ago

@Naugustogi You can install it with pip or clone the main git branch (currently at 1.1.2, you can verify by inspecting the setup.py file in your installation).

Naugustogi commented 1 year ago

@Naugustogi

alright, 1.1.2 doesn't work either, it won't even show me any error, after starting, it returns the main folder

mkardas commented 1 year ago

it returns the main folder

what do you mean? If you are running it as a script, you need to wrap the last line in print().

Naugustogi commented 1 year ago

what do you mean? If you are running it as a script, you need to wrap the last line in print().

ok it worked