Open Zappandy opened 2 days ago
Thanks, this is a good observation. So what I'm hearing is that we need some way to pass local_files_only
to the code path(s) that load the transformers, right? But probably also to this line, which doesn't have any config at all:
model = AutoModel.from_pretrained(config.bert_model).to(config.device)
Yes. I don't know how feasible it'd be to pass specific transformers configurations to the stanza pipeline config dictionary the user defines. This may be too much, but at least in terms of an offline mode, the local_files_only
should be passed to any pre_trained method as long as the user has set a cache directory where the models and tokenizers are stored.
An alternative is just to pass the local path to the from_pretrained
methods, but this is less portable.
I'm currently attempting to run a pipeline I had built on my local machine with stanza on an HPC with no access to the huggingface hub or the stanza server. To bypass this, I downloaded all of the models I needed and set the
download_method
toNone
. While this seemed to work with most processors in English, the coreference processor bypassed the local files and kept trying to download the google/electra-large model.After setting environment variables such as
in the models coref directory kept attempting to download files. I found out that to avoid any downloads, the parameter
HF_HUB_CACHE
to the corresponding path where the HF cache has been stored in the HPC andHF_HUB_OFFLINE='1'
, the huggingface pretrained method from thelocal_files_only
in thefrom_pretrained
method must be set to True (I tested this locally with no internet connection).Unless I'm missing something, with the current setup I don't see how I can pass this parameter to the pre_trained methods in the ~bert.py script without explicitly doing so in the script as the config object used is not the same stanza config dictionary I defined. It seems to me that the config object that it's read in the script is fetched from the model .pt file using the
torch.load
method, which of course means the config won't contain thelocal_files_only
parameter.Am I missing something or is this an expected functionality?