Closed AlexanderZhk closed 1 month ago
This could be somewhat easily solved with something like this (highlighted pseudocode) in
llm_studio/app_utils/sections/experiment.py
Thank you for reporting. When pushing the model to huggingface hub or downloading from the UI, these weights will be automatically sharded into smaller chunks (by default safetensors with I believe 5 GB each).
Is this happing only when using the weights from an old experiment to continue training with "Use previous experiment weights"?
Is this happing only when using the weights from an old experiment to continue training with "Use previous experiment weights"?
No, this is happening with a "freshly" trained model, I haven't tried the "Use previous experiment weights" option, yet. 'pretrained_weights': ''
in this experiment, which I'm suspecting this option sets.
Happens as soon as I click "Download model".
I'm pretty sure it is because it tries to load the whole model into device
, which is gpu[0].
Causing this if statment to evaluate to True (it sets device to CPU), either by forking the code or by running an experiment in the background lets me download the model https://github.com/h2oai/h2o-llmstudio/blob/d65fffb3ffa453c519670a9f19ff20335ee7f2dc/llm_studio/app_utils/sections/experiment.py#L1621-L1630
Thank you for reporting. When pushing the model to huggingface hub or downloading from the UI, these weights will be automatically sharded into smaller chunks (by default safetensors with I believe 5 GB each).
Just double checked, it "crashes" before reaching the sharding.
Sorry, can't fully follow. So you are using a local model or a model from Huggingface to start your Experiment?
And what is the next step? You can load a sharded model into multiple GPUs using Deepspeed. Training entirely on CPU is too slow, so we will not be supporting this in H2O LLM Studio.
When you are pushing a finished model to HF, you can choose the device:
Sorry, can't fully follow. So you are using a local model or a model from Huggingface to start your Experiment?
starting with a model from HF
And what is the next step? You can load a sharded model into multiple GPUs using Deepspeed. Training entirely on CPU is too slow, so we will not be supporting this in H2O LLM Studio.
this happens after training the model.
the issue happens when downloading the model, exporting it, using this button
When you are pushing a finished model to HF, you can choose the device:
is it possible to choose the device, when downloading the model? (not pushing to hf?)
We need to also add a device window there then. Need to see how easy that's doable.
For now the workaround is to hardcode it in code.
There is now a setting that should solve this: https://github.com/h2oai/h2o-llmstudio/pull/795
I trained a 33b model with DeepSpeed on 40GB cards. Based on the traceback, the model seems to be too large to fit into one GPU. Is it possible to fall back on the CPU for cases like this?
the .pth file is ~67 GB, so obviously it won't fit in
CPUGPU (edit: GPU, obviously)