Open davisengeler opened 2 years ago
Got in a similar loop when trying to start the back-end using a 1050ti:
dalle-backend | You should probably UPCAST the model weights to float32 if this was not intended. See [`~FlaxPreTrainedModel.to_fp32`] for further information on how to do this.
dalle-backend | Traceback (most recent call last):
dalle-backend | File "app.py", line 60, in <module>
dalle-backend | dalle_model = DalleModel(args.model_version)
dalle-backend | File "/app/dalle_model.py", line 70, in __init__
dalle-backend | self.params = replicate(params)
dalle-backend | File "/usr/local/lib/python3.8/dist-packages/flax/jax_utils.py", line 56, in replicate
dalle-backend | return jax.device_put_replicated(tree, devices)
dalle-backend | File "/usr/local/lib/python3.8/dist-packages/jax/_src/api.py", line 2801, in device_put_replicated
dalle-backend | return tree_map(_device_put_replicated, x)
dalle-backend | File "/usr/local/lib/python3.8/dist-packages/jax/_src/tree_util.py", line 184, in tree_map
dalle-backend | return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
dalle-backend | File "/usr/local/lib/python3.8/dist-packages/jax/_src/tree_util.py", line 184, in <genexpr>
dalle-backend | return treedef.unflatten(f(*xs) for xs in zip(*all_leaves))
dalle-backend | File "/usr/local/lib/python3.8/dist-packages/jax/_src/api.py", line 2796, in _device_put_replicated
dalle-backend | buf, = dispatch.device_put(x, devices[0])
dalle-backend | File "/usr/local/lib/python3.8/dist-packages/jax/_src/dispatch.py", line 871, in device_put
dalle-backend | return device_put_handlers[type(x)](x, device)
dalle-backend | File "/usr/local/lib/python3.8/dist-packages/jax/_src/dispatch.py", line 901, in _device_put_device_array
dalle-backend | x = _copy_device_array_to_device(x, device)
dalle-backend | File "/usr/local/lib/python3.8/dist-packages/jax/_src/dispatch.py", line 924, in _copy_device_array_to_device
dalle-backend | moved_buf = backend.buffer_from_pyval(x.device_buffer.to_py(), device)
dalle-backend | jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: Failed to allocate request for 384.00MiB (402653184B) on device ordinal 0
dalle-backend exited with code 1
Are there any required/recommended GPU specs for the Mega_full
model? I couldn't find anything, so I'm not sure if 12GB VRAM or less is enough?
Currently running the mega_full
model on Manjaro with a 3090 - wasn't able to get it running on a GPU with less memory. The model seems to be using 12GB VRAM, so a card with "only" 12GB would indeed not be enough (especially if you consider that the model most likely isn't going to be the only thing on your computer using VRAM unless you aren't running a DE or something).
Thanks @Aeriit. I assume it won’t be possible, but are you aware of any way to work around this since I’m “so close” to the VRAM requirements?
I solved my issue by setting a couple environment variables for python before starting the backend.
export XLA_PYTHON_CLIENT_PREALLOCATE=false
export XLA_PYTHON_CLIENT_ALLOCATOR=platform
python3 app.py --port 8080 --model_version Mega_full
This document explains why those environment variables are helpful. I'm now able to run the backend natively using the Mega_full
model on my 12GB 3080ti and generate images in ~6 seconds each.
cc @Aeriit @realies
Edit: I eventually had another RESOURCE_EXHAUSTED
crash on Mega_full, but was able to test it for a bit. @Aeriit is correct that 12GB is right at the cusp, but not quite enough for stability.
@davisengeler, unfortunately, that doesn't help on WSL for me.
I'm able to run Mega_Full on twin RTX 2080 TIs, each with 11 GB memory, and the NVIDIA library spreads the loading of the model across both GPUs until they are full. Image processing is spread across both GPUs too. So, if you have less than 12GB then adding another NVIDIA GPU that takes the total graphics memory above 12GB may work for you. I don't know whether the two GPUs have to be identical models or not.
Context
I’ve got the project running nicely when using
mini
andMega
models with an RTX 3080ti (12GB) on Ubuntu 22.03. Results take less than 4 seconds per image when usingMega
.Problem
Despite all other models working, I’ve not been able to start a
Mega_full
instance. I keep getting aRESOURCE_EXHAUSTED
error after starting the server (full traceback below). It’s possible this 12GB GPU simply isn’t enough, but I figured I’d report it here for feedback before dismissing it. Thanks for any recommendations!Full Traceback