Using distributed or parallel set-up in script?: no
Who can help?
@Rocketknight1
Information
[ ] The official example scripts
[X] My own modified scripts
Tasks
[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[x] My own task or dataset (give details below)
Reproduction
Issue
Models that require trust_remote_code=True can't be fully saved & loaded with save_pretrained() + from_pretrained().
In offline mode on a new machine during calling from_pretrained() it doesn't locate all required files in the saved local dir and tries to reach out to hf hub for the remote code part.
Stumbled on this in Kaggle Notebook env for competition.
Some Kaggle competitions require submitting code in Kaggle Notebooks, which are run later on private data and don't allow internet access.
Practically, this means you must prepare all models in advance, upload them as dependencies to the submission notebook.
So having transformers trying to reach out to hf-hub (when the model is already pre-downloaded) is not an option and disqualifies a group of models from usage.
System Info
Google Colab:
transformers
version: 4.46.3Who can help?
@Rocketknight1
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Issue
Models that require
trust_remote_code=True
can't be fully saved & loaded withsave_pretrained()
+from_pretrained()
. In offline mode on a new machine during callingfrom_pretrained()
it doesn't locate all required files in the saved local dir and tries to reach out to hf hub for the remote code part.How to reproduce
Colab | Kaggle Tested with popular
jinaai/jina-embeddings-v3
Includes step by step reproduction + resultsAdditional context
sentence-transformers
(code snippet included)Expected behavior
Model is fully saved in local dir with
save_pretrained()
and can be fully loaded from a local path withfrom_pretrained()
in offline mode