Intel graphics card Windows system local development

12dc32d commented 1 week ago

Describe the bug

Hello, my friends： I have just started learning how to develop large language models and am interning at a small company with only 11 people. I encountered difficulties after downloading the relevant files of llama3 8B. The specific problems are as follows. I am trying to test the lamma3 model with a tablet, but my graphics card is an Intel integrated graphics card and cannot use Intel Arc (it requires independent graphics card support). After debugging the paths of tokenizer_model and checkpoint, each run shows that the cuda driver needs to be used, but the Intel graphics card does not support the use of any version of cuda. The error （output）is: (.venv) PS D:\Llama3\llama3-main> python D:\Llama3\llama3-main\example_chat_completion.py --ckpt_dir D:\Llama3\llama3-main\ckpt_dir --tokenizer_path D:\Llama3\llama3-main\TOKENIZER_PATH\tokenizer.model

Traceback (most recent call last): File "D:\Llama3\llama3-main\example_chat_completion.py", line 89, in fire.Fire(main) File "D:\Python_model\llama3-main.venv\Lib\site-packages\fire\core.py", line 143, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Python_model\llama3-main.venv\Lib\site-packages\fire\core.py", line 477, in _Fire component, remaining_args = _CallAndUpdateTrace( ^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Python_model\llama3-main.venv\Lib\site-packages\fire\core.py", line 693, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Llama3\llama3-main\example_chat_completion.py", line 36, in main generator = Llama.build( ^^^^^^^^^^^^^ File "D:\Llama3\llama3-main\llama\generation.py", line 83, in build torch.cuda.set_device(local_rank) File "D:\Python_model\llama3-main.venv\Lib\site-packages\torch\cuda__init__.py", line 399, in set_device torch._C._cuda_setDevice(device) ^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'

How can I modify the code in the llama3 file, or make any adjustments on my computer? 24 hours waiting for any reply.

JeffreyLind22 commented 1 week ago

Sorry for the late reply in continuation from #242; typed up an answer and the page refreshed and got rid of my response facepalm. I suggest finding some non-CUDA dependent code and figuring out how to "cut" CUDA out of this codebase if you really want to use this code. Here are a few good resources that may help:

llama-recipes/recipes/quickstart May have some conditional code to check if CUDA is available or not.
recipes/inference/mobile_inference/android_inference Same idea, may be able to find out where the inference is occurring and MAYBE even use MLC-LLM
If you really want to go deep, look into the source code for llama.cpp, Ollama, or specifically the LangChain HuggingFacePipeline source code and see what happens when you set backend="openvino" AND model_kwargs={"device": "GPU"}

Unfortunately even state-of-the-art models are very finicky and not too well documented compared to normal software projects. I've had to dig around in source code for hours for this type of stuff lol. Good luck!

12dc32d commented 1 week ago

Sorry for the late reply in continuation from #242; typed up an answer and the page refreshed and got rid of my response facepalm. I suggest finding some non-CUDA dependent code and figuring out how to "cut" CUDA out of this codebase if you really want to use this code. Here are a few good resources that may help:

llama-recipes/recipes/quickstart May have some conditional code to check if CUDA is available or not.

recipes/inference/mobile_inference/android_inference Same idea, may be able to find out where the inference is occurring and MAYBE even use MLC-LLM

If you really want to go deep, look into the source code for llama.cpp, Ollama, or specifically the LangChain HuggingFacePipeline source code and see what happens when you set backend="openvino" AND model_kwargs={"device": "GPU"}

Unfortunately even state-of-the-art models are very finicky and not too well documented compared to normal software projects. I've had to dig around in source code for hours for this type of stuff lol. Good luck!

Sorry for the late reply in continuation from #242; typed up an answer and the page refreshed and got rid of my response facepalm. I suggest finding some non-CUDA dependent code and figuring out how to "cut" CUDA out of this codebase if you really want to use this code. Here are a few good resources that may help:

llama-recipes/recipes/quickstart May have some conditional code to check if CUDA is available or not.

recipes/inference/mobile_inference/android_inference Same idea, may be able to find out where the inference is occurring and MAYBE even use MLC-LLM

If you really want to go deep, look into the source code for llama.cpp, Ollama, or specifically the LangChain HuggingFacePipeline source code and see what happens when you set backend="openvino" AND model_kwargs={"device": "GPU"}

Unfortunately even state-of-the-art models are very finicky and not too well documented compared to normal software projects. I've had to dig around in source code for hours for this type of stuff lol. Good luck!

Thank you bro. This change is difficult, my boss has equipped a new desktop computer for work (not just for me, but for the company, and I am currently using it). Your suggestions and links are very helpful to me, I have read those references carefully and plan to try to understand and use these knowledge about the framework in my free time.

meta-llama / llama3

Intel graphics card Windows system local development #245

Describe the bug