kermitt2 / delft

a Deep Learning Framework for Text https://delft.readthedocs.io/
Apache License 2.0
387 stars 64 forks source link

Load transformer config and tokenizer from disk when n>1 for nfold traninig #125

Closed kermitt2 closed 2 years ago

kermitt2 commented 2 years ago

In case we have:

When performing a nfold training for text classification or sequence labeling, we currently reload the transformer configuration and the tokenizer via AutoModel and HuggingFace hub n times, one time for each model. In order to limit the access to Hugging Face Hub (not very reliable), we should only make an online access the first time for n=1, and then load the transformer configuration and the transformer tokenizer from file, because both have been saved when building the model for n=1.