Open ishaansharma opened 1 year ago
WDYT @younesbelkada
Hi @ishaansharma Thanks a lot for the proposal! I personally would not advocate to go for that route, the quantization schemes we support right now consists on post-trainign quantization, meaning the usecase is always
1- load pre-trained weights from the hub or locally 2- quantize the pre-trained weights
The API you propose is cool, but I am afraid will not be used in practice as from_config
will load random weights to the model. Let me know if I misunderstood anything!
Hi @ishaansharma Thanks a lot for the proposal! I personally would not advocate to go for that route, the quantization schemes we support right now consists on post-trainign quantization, meaning the usecase is always
1- load pre-trained weights from the hub or locally 2- quantize the pre-trained weights
The API you propose is cool, but I am afraid will not be used in practice as
from_config
will load random weights to the model. Let me know if I misunderstood anything!
I wanted this feature because it will be very useful for pre-training from scratch from any large language model with huge parameters that usually cannot be done on small machines will very less computation cost .
To pre-train any model from scratch and to build a language model on a totally new language , I don't think the loaded random weights from the config will cause any harm. as eventually weights will get updated with the training .
@younesbelkada , I just want that even the pre-training a model of any language from scratch using any LLM architecture can be done on any machine .
Let me know if this approach help .
Warm Regard.
Thanks for getting back to me @ishaansharma !
I wanted this feature because it will be very useful for pre-training from scratch from any large language model with huge parameters that usually cannot be done on small machines will very less computation cost .
Since you cannot perform full fine-tuning when the model is quantized I think that this is technically not possible :/ This comment can also be applied on your thoughts here:
To pre-train any model from scratch and to build a language model on a totally new language , I don't think the loaded random weights from the config will cause any harm. as eventually weights will get updated with the training .
I have a similar use case but I want to load huge models efficiently so I've been following this guide, which first loads the empty model from a config and then loads the state into the empty model. But I do not understand how we can add other parameters (like load_in_8bit) to this process - from_config does not support such kwargs and nor does load_checkpoint_and_dispatch
. So is that simply not possible in this kind of workflow? How else would one efficiently and quickly load a model in 8 bit? @younesbelkada
Hey I stumbled upon the same issue, would've liked to be able to supply a device_map
to AutoModel.from_config
. :)
cc @SunMarc
Hey @janEbert , what would be the use case for loading the model with from_config
and device_map
? A workaround is to save the model loaded with from_config
then use from_pretrained
to load it again.
If you want to quantize the model loaded with from_config, please read the points that younes shared above. Thanks !
The use case is to have the model properly distributed automatically. The workaround does work but is extremely hacky and ugly, if I'm completely honest. :sweat_smile: Cheers for the suggestion, though!
The use case is to have the model properly distributed automatically
We recommend using device_map for inference but it might no be very useful on a model with random weights.
Nevertheless, the algorithm behind device_map
requires us to have the loaded weights somewhere. When using from_config
, we are initializing the weights from the model definition and not from a file that was stored on the hub. If you can load the entire model on the cpu, then what you can do is to use dispatch_model function to have the model distributed across your gpus.
from transformers import AutoConfig, AutoModelForCausalLM
from accelerate import dispatch_model, infer_auto_device_map
config = AutoConfig.from_pretrained("model")
model = AutoModelForCausalLM.from_config(config)
# infer device_map
device_map = infer_auto_device_map(model, no_split_module_classes = model._no_split_modules)
dispatch_model(model, device_map)
LMK if this works for you ! You can find more information on how device_map works here.
Thanks a lot for the infer_auto_device_map
and dispatch_model
command! As you can tell, I would like to avoid loading the model on the CPU first so I'm not limited by RAM regarding the model size.
Sorry for not giving enough information in the first place. My use case is that I want to convert a model from custom code to HF "stdlib" code. The converted model is instantiated via from_config
from a converted config and then I load the converted state dict into it.
However, since device_map
is not supported with from_config
, I am limited by the CPU RAM. Even your really nice suggestions don't help in that case; not even the first one, since I'd still have to be able to instantiate the model on single-node CPU first. :/
Even your really nice suggestions don't help in that case; not even the https://github.com/huggingface/transformers/issues/26901#issuecomment-2422621147, since I'd still have to be able to instantiate the model on single-node CPU first. :/
How big is the model ? The model should be sharded, so it should only take max_shard_size in term of memory. I think that in save_pretrained, we set the shard size to 5GB. Also, if the model is in safetensors format, we should be able to load the model directly to the gpu without passing by the cpu.
Would this work in the multi-node setting as well? Because the model is too big to fit on one node. Sorry that wasn't clear.
So this doesn't work on multi-node setting. However, we are working on making transformers models compatible with PP/TP methods from pytorch that works with multi-node !
Feature request
Add quantization_config feature to AutoModelForCausalLM from config . I am trying to pretrain a model from scratch and use bits and bytes so that It can be trained on less computation expensive machines. Below is my quantization config :
When I attempted to take the config of certain model from_pretrained function it failed and raised a Type Error mentioned below.
The Error:
Motivation
I had tried a work around by saving the model from the loaded config details from the model and then load the same model with quantization config .
I believe this process could get fixed and we can enable/add quantization while loading the model from the config itself.
Your contribution