Closed zucchini-nlp closed 1 month ago
My bad, found a flag that disables fast loading (_supports_param_buffer_assignment
)
Anyway, I would like to understand more on why fast init fails to keep same dtype for vision module, and if we will be able to support these kind of models, so leaving an issue open :)
@zucchini-nlp does that model have the flag? If not, could you make a PR to do so as a quick fix?
indeed, what really is the issue is that chunk being initialized in float32
even with the explicit dtype
Yes, made a PR (https://github.com/huggingface/transformers/pull/32091) to fix. Also I found other composite models like LLaVa aren't broken, so I have no idea what was wrong with Chameleon
I found why it was defaulting to fp32 in vision model and bf16 in LM. The original weighs were loaded and converted in that precision, so the fast init was loading them in the same dtype
as the weights. Closing as resolved!
System Info
PR on fast init (#31771) seems to have broken Chameleon loading. When I try to load the model with the same dtype on cpu as the weights are (bf16), inference fails due to dtype mismatch. It doesn't fail if load on gpu with
device_map="cuda"
thoughWeights in the VQ module now are in fp32, while the LM module is in bf16. I still can make it work by not casting bf16 on
pixel_values
but that is not an expected behavior and causes inconsistencies, because if I load with fp16 then I would have to cast also inputs to fp16.Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
composite models, like Chameleon should not be breaking when loading with same dtype as their weights
@muellerzr @ArthurZucker I didn't dive deep yet, guess you will be faster in spotting the root cause