Closed williford closed 2 months ago
cc @ydshieh who worked on a similar issue which was fixed by https://github.com/huggingface/transformers/pull/28122
Hi @williford
Could you share your system info with us? You can run the command transformers-cli env
and copy-paste its output below.
For the reproduction I installed transformers
with pip install git+https://github.com/huggingface/transformers
:
transformers
version: 4.43.0.dev0@ydshieh If I'm understanding the code correctly, your change makes sure the model._initialize_weights
is called. ResNetForImageClassification inherits from ResNetPreTrainedModel, which overloads _init_weights. However, ResNetPreTrainedModel doesn't do anything when the module is a torch.nn.module.linear.Linear.
When fast_init is not set, then the Linear module initializes the weights via the "reset_parameters" method.
@williford Thank you for diving into this issue. Yes, you are correct! I opened a PR to fix it and it works now.
System Info
It seems that the changes with https://github.com/huggingface/transformers/pull/11471 broke fine-tuning of ResNet (when the number of classes is being changed).
It seems like most models handle this by adding Linear to the following: https://github.com/huggingface/transformers/blob/ae9dd02ee1a8627d26be32202202b8081e9855a4/src/transformers/models/resnet/modeling_resnet.py#L274
However, it seems like it would be better to handle it when the mismatch size is detected in modeling_utils.py: https://github.com/huggingface/transformers/blob/ae9dd02ee1a8627d26be32202202b8081e9855a4/src/transformers/modeling_utils.py#L4282
Who can help?
@amyeroberts
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
E.g.
Disabling the _fast_init fixes the issue:
Expected behavior
The statistics of the initialized weights should be similar with and without the _fast_init - importantly, it shouldn't contain NaN's and the maximum absolute values shouldn't be 0 or really large (e.g. > 1e20).