Closed abpani closed 1 month ago
Have you tried playing with different parameters of the device_map
?
You can read more about it and about customizing it here: https://huggingface.co/docs/transformers/big_models#accelerates-big-model-inference
cc @SunMarc I'm trying to find a doc that dives into the different attributes that device_map
can accept but not finding any such docs in the transformers docs.
still same issue. it shows different errors like loaded in different devices . cuda 0 and cuda 1 device_map = {'model.embed_tokens': 0, 'model.layers.0': 0, 'model.layers.1': 0, 'model.layers.2': 0, 'model.layers.3': 0, 'model.layers.4': 0, 'model.layers.5': 0, 'model.layers.6': 0, 'model.layers.7': 0, 'model.layers.8': 0, 'model.layers.9': 0, 'model.layers.10': 0, 'model.layers.11': 1, 'model.layers.12': 1, 'model.layers.13': 1, 'model.layers.14': 1, 'model.layers.15': 1, 'model.layers.16': 1, 'model.layers.17': 1, 'model.layers.18': 1, 'model.layers.19': 2, 'model.layers.20': 2, 'model.layers.21': 2, 'model.layers.22': 2, 'model.layers.23': 2, 'model.layers.24': 2, 'model.layers.25': 2, 'model.layers.26': 2, 'model.layers.27': 3, 'model.layers.28': 3, 'model.layers.29': 3, 'model.layers.30': 3, 'model.layers.31': 3, 'model.norm': 3, 'lm_head': 3}
@LysandreJik You can find the details about device map here https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct/blob/main/model.safetensors.index.json
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
@LysandreJik I tried as you suggested still same issue in a multi gpu environment.
Hey @abpani, the final allocation looks very strange indeed. Can you try with device_map = "sequential" and set max_memory
? Also what do you mean by it shows different errors like loaded in different device
. Could you share the traceback ? Thanks !
@SunMarc funny thing is it does not happen with Mistral models. it works balanced for mistral models. But with qwen, phi, llama still same issue.
Hey @abpani, the final allocation looks very strange indeed. Can you try with device_map = "sequential" and set
max_memory
? Also what do you mean byit shows different errors like loaded in different device
. Could you share the traceback ? Thanks !
I dont have that currently. but still auto devicemap should work fine as it works perfectly with all mistral models.
Might just be the not_split_module
or simply the sizes of the models
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Closing as I believe you have the balanced
option 🤗 updating the no_split module is also possible. You can never completely evenly split as the lm head is a lot bigger as a pure layer than say a mlp
System Info
python 3.10.10 torch 2.3.1 transformers 4.43.2 optimum 1.17.1 auto_gptq 0.7.1 bitsandbytes 0.43.2 accelerate 0.33.0
Llama3.1 8B Instruct gets loaded like this. So I cant even go more than 1 batch size while finetuning
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
I would like the model to be loaded evenly so that I can finetune with a larger batch size