Open 12dc32d opened 1 week ago
Sorry for the late reply in continuation from #242; typed up an answer and the page refreshed and got rid of my response facepalm. I suggest finding some non-CUDA dependent code and figuring out how to "cut" CUDA out of this codebase if you really want to use this code. Here are a few good resources that may help:
backend="openvino"
AND model_kwargs={"device": "GPU"}
Unfortunately even state-of-the-art models are very finicky and not too well documented compared to normal software projects. I've had to dig around in source code for hours for this type of stuff lol. Good luck!
Sorry for the late reply in continuation from #242; typed up an answer and the page refreshed and got rid of my response facepalm. I suggest finding some non-CUDA dependent code and figuring out how to "cut" CUDA out of this codebase if you really want to use this code. Here are a few good resources that may help:
- llama-recipes/recipes/quickstart May have some conditional code to check if CUDA is available or not.
- recipes/inference/mobile_inference/android_inference Same idea, may be able to find out where the inference is occurring and MAYBE even use MLC-LLM
- If you really want to go deep, look into the source code for llama.cpp, Ollama, or specifically the LangChain HuggingFacePipeline source code and see what happens when you set
backend="openvino"
ANDmodel_kwargs={"device": "GPU"}
Unfortunately even state-of-the-art models are very finicky and not too well documented compared to normal software projects. I've had to dig around in source code for hours for this type of stuff lol. Good luck!
Sorry for the late reply in continuation from #242; typed up an answer and the page refreshed and got rid of my response facepalm. I suggest finding some non-CUDA dependent code and figuring out how to "cut" CUDA out of this codebase if you really want to use this code. Here are a few good resources that may help:
- llama-recipes/recipes/quickstart May have some conditional code to check if CUDA is available or not.
- recipes/inference/mobile_inference/android_inference Same idea, may be able to find out where the inference is occurring and MAYBE even use MLC-LLM
- If you really want to go deep, look into the source code for llama.cpp, Ollama, or specifically the LangChain HuggingFacePipeline source code and see what happens when you set
backend="openvino"
ANDmodel_kwargs={"device": "GPU"}
Unfortunately even state-of-the-art models are very finicky and not too well documented compared to normal software projects. I've had to dig around in source code for hours for this type of stuff lol. Good luck!
Thank you bro. This change is difficult, my boss has equipped a new desktop computer for work (not just for me, but for the company, and I am currently using it). Your suggestions and links are very helpful to me, I have read those references carefully and plan to try to understand and use these knowledge about the framework in my free time.
Describe the bug
Hello, my friends: I have just started learning how to develop large language models and am interning at a small company with only 11 people. I encountered difficulties after downloading the relevant files of llama3 8B. The specific problems are as follows. I am trying to test the lamma3 model with a tablet, but my graphics card is an Intel integrated graphics card and cannot use Intel Arc (it requires independent graphics card support). After debugging the paths of tokenizer_model and checkpoint, each run shows that the cuda driver needs to be used, but the Intel graphics card does not support the use of any version of cuda. The error (output)is: (.venv) PS D:\Llama3\llama3-main> python D:\Llama3\llama3-main\example_chat_completion.py --ckpt_dir D:\Llama3\llama3-main\ckpt_dir --tokenizer_path D:\Llama3\llama3-main\TOKENIZER_PATH\tokenizer.model
Traceback (most recent call last): File "D:\Llama3\llama3-main\example_chat_completion.py", line 89, in
fire.Fire(main)
File "D:\Python_model\llama3-main.venv\Lib\site-packages\fire\core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Python_model\llama3-main.venv\Lib\site-packages\fire\core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Python_model\llama3-main.venv\Lib\site-packages\fire\core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Llama3\llama3-main\example_chat_completion.py", line 36, in main
generator = Llama.build(
^^^^^^^^^^^^^
File "D:\Llama3\llama3-main\llama\generation.py", line 83, in build
torch.cuda.set_device(local_rank)
File "D:\Python_model\llama3-main.venv\Lib\site-packages\torch\cuda__init__.py", line 399, in set_device
torch._C._cuda_setDevice(device)
^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'
How can I modify the code in the llama3 file, or make any adjustments on my computer? 24 hours waiting for any reply.