Closed sebastian-weisshaar closed 1 year ago
Instead of pinning an older version of Transformers, is there a different/more proper way to handle this? They probably block this in the new version for a reason, so seems like a bit of a code smell to keep using
to()
on quantized models. Do they not have any guidance on what to do?
We do not use .to(device)
. Somewhere deep in transformers they call it when loading the model. Tried to workaround with the device_map
but this did not stop transformers from calling .to(device)
We pin an older version of accelerate
With the new version of transformers we cannot send the model to a device if we use quantization (https://github.com/huggingface/transformers/blob/66954ea25e342fd451c26ec1c295da0b8692086b/src/transformers/modeling_utils.py#L1897). To solve this we had to specify the accelerate version, instead of using the main branch on GitHub.
This PR includes: https://github.com/jina-ai/jerboa/pull/102.
THIS CODE DOES NOT RUN ON APPLE SILICON