Open SoftologyPro opened 3 days ago
That's very slow and suspect it is not using your GPU. On my system (Apple MBP M2 Max with 96 GiB RAM), memory usage at 4096 token context length is 15.16 GiB, which would fit entirely within your 24 GiB 4090.
I did install the appropriate GPU torch and Task Manager shows it is the GPU and not the CPU being used. Task Manager also shows dedicated GPU memory is 21.9/24.0 so not maxed out there.
For the install I basically use these pip commands to get the requirements, gradio, and then swap CPU torch out for GPU torch.
pip install -r requirements.txt
pip install gradio
pip uninstall -y torch
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts torch==2.4.1+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
For any other WIndows users (or to help test this issue) here is an install.bat and run.bat. Save them both to an empty directory, command prompt into that directory, run install.bat, then run run.bat to start it.
install.bat
@echo off
echo *** %time% *** Deleting LLaMa-Mesh directory if it exists
if exist LLaMa-Mesh\. rd /S /Q LLaMa-Mesh
echo *** %time% *** Cloning LLaMa-Mesh repository
git clone https://github.com/nv-tlabs/LLaMa-Mesh
cd LLaMa-Mesh
echo *** %time% *** Creating venv
python -m venv venv
echo *** %time% *** Activating venv
call venv\scripts\activate.bat
echo *** %time% *** Installing requirements
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install gradio
echo *** %time% *** Installing GPU torch
pip uninstall -y torch
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts torch==2.4.1+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
call venv\scripts\deactivate.bat
cd ..
echo *** %time% *** Finished LLaMa-Mesh install
echo.
echo Check the stats for any errors. Do not assume it worked.
pause
run.bat
@echo off
cd LLaMa-Mesh
call venv\scripts\activate.bat
python app.py
call venv\scripts\deactivate.bat
cd..
After well over an hour processing it did finish, but this was the result for "Create a 3D mesh of a ginger and white kitten dancing wearing a tutu"
Testing the first example prompt gives this error after clicking it
Traceback (most recent call last):
File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\gradio\queueing.py", line 624, in process_events
response = await route_utils.call_process_api(
File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\gradio\route_utils.py", line 323, in call_process_api
output = await app.get_blocks().process_api(
File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\gradio\blocks.py", line 2015, in process_api
result = await self.call_function(
File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\gradio\blocks.py", line 1574, in call_function
prediction = await utils.async_iteration(iterator)
File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\gradio\utils.py", line 710, in async_iteration
return await anext(iterator)
File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\gradio\utils.py", line 815, in asyncgen_wrapper
response = await iterator.__anext__()
File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\gradio\chat_interface.py", line 678, in _stream_fn
first_response = await async_iteration(generator)
File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\gradio\utils.py", line 710, in async_iteration
return await anext(iterator)
File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\gradio\utils.py", line 704, in __anext__
return await anyio.to_thread.run_sync(
File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 2441, in run_sync_in_worker_thread
return await future
File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 943, in run
result = context.run(func, *args)
File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\gradio\utils.py", line 687, in run_sync_iterator_async
return next(iterator)
File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\app.py", line 158, in chat_llama3_8b
for text in streamer:
File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\transformers\generation\streamers.py", line 223, in __next__
value = self.text_queue.get(timeout=self.timeout)
File "D:\Python\lib\queue.py", line 179, in get
raise Empty
_queue.Empty
because it does not put the prompt text into the "Type a message" field?
If I reload the UI and manually type the prompt "Create a 3D model of a wooden hammer" into the "Type a message" field it does then start without error.
There are two fields of pre-written prompts, ones above the entry box and ones below. The ones above give me an error, but the ones below seem to work.
There are two fields of pre-written prompts, ones above the entry box and ones below. The ones above give me an error, but the ones below seem to work.
I only see the example buttons and clicked the first of those, ie
Here's what I see on my machine.
The buttons in the upper box ("Gradio ChatInterface") do not seem to work, but the buttons below ("Examples") do.
Here's what I see on my machine.
The buttons in the upper box ("Gradio ChatInterface") do not seem to work, but the buttons below ("Examples") do.
Anyway, you are on Mac and this has nothing to do with the issue I am trying to get an answer to. You should start your own issue.
Someone posted then deleted a suggestion to try flash-attn. Tried that. Not any faster. Any other ideas? Thanks.
Are you using bf16
? It's much faster than fp32
Are you using
bf16
? It's much faster thanfp32
How do I set that? I do not see either in app.py.
How fast is this supposed to generate the OBJ vertex points? I have it installed locally (Windows with a 24GB 4090), the gradio starts, and when I prompt it the vertex generation seems to take around 10 seconds per line/vertex.
Is this normal? Any tps to speed it up?
Thanks.