h2oai / h2ogpt

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/
http://h2o.ai
Apache License 2.0
11.24k stars 1.23k forks source link

Segmentation fault on many models (h20gpt ones do work) Probably cuda version check in rocm environment #758

Closed markg85 closed 1 year ago

markg85 commented 1 year ago

Hi,

No clue if it's intentional or accidental, but i'm having issues running any model besides h2ogpt ones.

Using Model psmathur/orca_mini_v3_7b
Prep: persist_directory=db_dir_UserData exists, user_path=/home/mark/GitProjects/h2ogpt/persdata passed, adding any changed or new documents
load INSTRUCTOR_Transformer
max_seq_length  512
0it [00:00, ?it/s]
0it [00:00, ?it/s]
Loaded 0 sources for potentially adding to UserData
Starting get_model: psmathur/orca_mini_v3_7b 
The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
device_map: {'': 0}
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Traceback (most recent call last):
  File "/home/mark/GitProjects/h2ogpt/generate.py", line 16, in <module>
    entrypoint_main()
  File "/home/mark/GitProjects/h2ogpt/generate.py", line 12, in entrypoint_main
    H2O_Fire(main)
  File "/home/mark/GitProjects/h2ogpt/src/utils.py", line 60, in H2O_Fire
    fire.Fire(component=component, command=args)
  File "/home/mark/.conda/envs/h2ogpt/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/mark/.conda/envs/h2ogpt/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/mark/.conda/envs/h2ogpt/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/mark/GitProjects/h2ogpt/src/gen.py", line 1047, in main
    model0, tokenizer0, device = get_model(reward_type=False,
  File "/home/mark/GitProjects/h2ogpt/src/gen.py", line 1496, in get_model
    return get_hf_model(load_8bit=load_8bit,
  File "/home/mark/GitProjects/h2ogpt/src/gen.py", line 1659, in get_hf_model
    model = get_non_lora_model(base_model, model_loader, load_half, load_gptq,
  File "/home/mark/GitProjects/h2ogpt/src/gen.py", line 1264, in get_non_lora_model
    model = model_loader(
  File "/home/mark/.conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2685, in from_pretrained
    from .utils.bitsandbytes import get_keys_to_not_convert, replace_with_bnb_linear
  File "/home/mark/.conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/utils/bitsandbytes.py", line 11, in <module>
    import bitsandbytes as bnb
  File "/home/mark/.conda/envs/h2ogpt/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "/home/mark/.conda/envs/h2ogpt/lib/python3.10/site-packages/bitsandbytes/research/__init__.py", line 1, in <module>
    from . import nn
  File "/home/mark/.conda/envs/h2ogpt/lib/python3.10/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "/home/mark/.conda/envs/h2ogpt/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "/home/mark/.conda/envs/h2ogpt/lib/python3.10/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "/home/mark/.conda/envs/h2ogpt/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 13, in <module>
    setup.run_cuda_setup()
  File "/home/mark/.conda/envs/h2ogpt/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 120, in run_cuda_setup
    binary_name, cudart_path, cc, cuda_version_string = evaluate_cuda_setup()
  File "/home/mark/.conda/envs/h2ogpt/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 341, in evaluate_cuda_setup
    cuda_version_string = get_cuda_version()
  File "/home/mark/.conda/envs/h2ogpt/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 311, in get_cuda_version
    major, minor = map(int, torch.version.cuda.split("."))
AttributeError: 'NoneType' object has no attribute 'split'

Just to be sure with regard to hardware and versions. I'm running pytorch woth rocm (AMD GPU). This does work just fine on h2ogpt models! Output of some pytorch specifics:

❯ python3 -c "import torch;print('CUDA(hip) is available',torch.cuda.is_available());print('cuda(hip)_device_num:',torch.cuda.device_count());print('Radeon device:',torch.cuda.get_device_name(torch.cuda.current_device()))"
CUDA(hip) is available True
cuda(hip)_device_num: 1
Radeon device: AMD Radeon RX 7900 XT

I'm guessing - based on the error - that it wants to look for cuda which just isn't there in a rocm environment.

ryanchesler commented 1 year ago

I'm less familiar with AMD and rocm side of things. Do you see have any way of checking gpu utilization while running the h2ogpt model? I'd like to double check if it is actually running on gpu or if it is just pushing it to cpu.

Seems to me like if it is actually working on amd gpu then it is snagging on the bitsandbytes piece that relies on cuda in a way that isn't compatible with cuda(hip)/rocm

markg85 commented 1 year ago

Yeah, i resolved this. This bitsandbytes fork hacks in rocm support and makes it work.

So closing my own issue :)