Slow speed on Windows - Githubissues

SoftologyPro commented 3 days ago

How fast is this supposed to generate the OBJ vertex points? I have it installed locally (Windows with a 24GB 4090), the gradio starts, and when I prompt it the vertex generation seems to take around 10 seconds per line/vertex.

Is this normal? Any tps to speed it up?

Thanks.

oursland commented 3 days ago

That's very slow and suspect it is not using your GPU. On my system (Apple MBP M2 Max with 96 GiB RAM), memory usage at 4096 token context length is 15.16 GiB, which would fit entirely within your 24 GiB 4090.

SoftologyPro commented 3 days ago

I did install the appropriate GPU torch and Task Manager shows it is the GPU and not the CPU being used. Task Manager also shows dedicated GPU memory is 21.9/24.0 so not maxed out there.

For the install I basically use these pip commands to get the requirements, gradio, and then swap CPU torch out for GPU torch.

pip install -r requirements.txt
pip install gradio
pip uninstall -y torch
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts torch==2.4.1+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

SoftologyPro commented 3 days ago

For any other WIndows users (or to help test this issue) here is an install.bat and run.bat. Save them both to an empty directory, command prompt into that directory, run install.bat, then run run.bat to start it.

install.bat

@echo off

echo *** %time% *** Deleting LLaMa-Mesh directory if it exists
if exist LLaMa-Mesh\. rd /S /Q LLaMa-Mesh

echo *** %time% *** Cloning LLaMa-Mesh repository
git clone https://github.com/nv-tlabs/LLaMa-Mesh
cd LLaMa-Mesh

echo *** %time% *** Creating venv
python -m venv venv

echo *** %time% *** Activating venv
call venv\scripts\activate.bat

echo *** %time% *** Installing requirements
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install gradio

echo *** %time% *** Installing GPU torch
pip uninstall -y torch
pip install --no-cache-dir --ignore-installed --force-reinstall --no-warn-conflicts torch==2.4.1+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

call venv\scripts\deactivate.bat
cd ..
echo *** %time% *** Finished LLaMa-Mesh install
echo.
echo Check the stats for any errors.  Do not assume it worked.
pause

run.bat

@echo off
cd LLaMa-Mesh
call venv\scripts\activate.bat
python app.py
call venv\scripts\deactivate.bat
cd..

SoftologyPro commented 3 days ago

After well over an hour processing it did finish, but this was the result for "Create a 3D mesh of a ginger and white kitten dancing wearing a tutu"

SoftologyPro commented 3 days ago

Testing the first example prompt gives this error after clicking it

Traceback (most recent call last):
  File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\gradio\queueing.py", line 624, in process_events
    response = await route_utils.call_process_api(
  File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\gradio\route_utils.py", line 323, in call_process_api
    output = await app.get_blocks().process_api(
  File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\gradio\blocks.py", line 2015, in process_api
    result = await self.call_function(
  File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\gradio\blocks.py", line 1574, in call_function
    prediction = await utils.async_iteration(iterator)
  File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\gradio\utils.py", line 710, in async_iteration
    return await anext(iterator)
  File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\gradio\utils.py", line 815, in asyncgen_wrapper
    response = await iterator.__anext__()
  File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\gradio\chat_interface.py", line 678, in _stream_fn
    first_response = await async_iteration(generator)
  File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\gradio\utils.py", line 710, in async_iteration
    return await anext(iterator)
  File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\gradio\utils.py", line 704, in __anext__
    return await anyio.to_thread.run_sync(
  File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 2441, in run_sync_in_worker_thread
    return await future
  File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 943, in run
    result = context.run(func, *args)
  File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\gradio\utils.py", line 687, in run_sync_iterator_async
    return next(iterator)
  File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\app.py", line 158, in chat_llama3_8b
    for text in streamer:
  File "D:\Tests\LLaMA-Mesh\LLaMa-Mesh\venv\lib\site-packages\transformers\generation\streamers.py", line 223, in __next__
    value = self.text_queue.get(timeout=self.timeout)
  File "D:\Python\lib\queue.py", line 179, in get
    raise Empty
_queue.Empty

because it does not put the prompt text into the "Type a message" field?

If I reload the UI and manually type the prompt "Create a 3D model of a wooden hammer" into the "Type a message" field it does then start without error.

oursland commented 3 days ago

There are two fields of pre-written prompts, ones above the entry box and ones below. The ones above give me an error, but the ones below seem to work.

SoftologyPro commented 3 days ago

There are two fields of pre-written prompts, ones above the entry box and ones below. The ones above give me an error, but the ones below seem to work.

I only see the example buttons and clicked the first of those, ie

oursland commented 1 day ago

Here's what I see on my machine.

The buttons in the upper box ("Gradio ChatInterface") do not seem to work, but the buttons below ("Examples") do.

SoftologyPro commented 1 day ago

Here's what I see on my machine.

The buttons in the upper box ("Gradio ChatInterface") do not seem to work, but the buttons below ("Examples") do.

Anyway, you are on Mac and this has nothing to do with the issue I am trying to get an answer to. You should start your own issue.

SoftologyPro commented 1 day ago

Someone posted then deleted a suggestion to try flash-attn. Tried that. Not any faster. Any other ideas? Thanks.

thuwzy commented 19 hours ago

Are you using bf16? It's much faster than fp32

SoftologyPro commented 14 hours ago

Are you using bf16? It's much faster than fp32

How do I set that? I do not see either in app.py.

nv-tlabs / LLaMA-Mesh

Slow speed on Windows #10