fix: set environment variable for cuda

jina-ai / jerboa

LLM finetuning

Apache License 2.0

41 stars 4 forks source link

Instead of pinning an older version of Transformers, is there a different/more proper way to handle this? They probably block this in the new version for a reason, so seems like a bit of a code smell to keep using to() on quantized models. Do they not have any guidance on what to do?

We do not use .to(device). Somewhere deep in transformers they call it when loading the model. Tried to workaround with the device_map but this did not stop transformers from calling .to(device)

We pin an older version of accelerate

jina-ai / jerboa

fix: set environment variable for cuda #104