Open geroldmeisinger opened 1 month ago
THUDM/cogvlm2-llama3-chat-19B-int4 size: 12.5 GB VRAM: 14868MiB
system: Linux, RTX 4060 Ti 16GB
The VRAM usage is different based on the system and settings used.
different how? can we get a range? or minimum requirements? or sensible categories? to me it would be important to know if there is ANY chance to run the model, even if it is very slow.
different oses can change vram usage different cuda versions can change vram usage. different driver versions can as well amd cards will use different vram as well vs nvidia same with intel gpus.
if this was built to only run on 1 platform, 1 driver version, 1 cuda version, etc then it would be easy to track.
which still leaves the "minimum requirement"
different oses can change vram usage different cuda versions can change vram usage. different driver versions can as well amd cards will use different vram as well vs nvidia same with intel gpus.
if this was built to only run on 1 platform, 1 driver version, 1 cuda version, etc then it would be easy to track.
thanks for the explanation!
hello and thank you for great tool. it would be nice to know some (approximate) metadata about models before downloading to avoid downloading huge files which in the end won't work:
in
~/.cache/huggingface/hub
:system: Linux, RTX 3060 12GB
btw i have to set
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
(I also document them here: https://github.com/jhc13/taggui/discussions/169)