Open sergsb opened 9 months ago
I found out that the problem is with trust_remote_code
, which is also mandatory for loading tokenizers.
see also https://github.com/ludwig-ai/ludwig/pull/3632
Hi @sergsb,
Thanks for sharing your experience.
The Ludwig team is focused on building first class support for natively supported models on HF. As I understand, supporting models that require trust_remote_code=True
is tenable, but carries other risks that need to be thought through.
CC: @arnavgarg1
Hi @justinxzhao,
Thanks for the answer. Maybe an option would be introducing a global config parameter, trust_remote_code
, and set it to HF models and tokenizers?
@sergsb that seems reasonable to me. I think that's what @arnavgarg1 was going for in https://github.com/ludwig-ai/ludwig/pull/3632, specifically here.
I want to use this model as an encoder. As you can see from the description, the model can be uploaded like:
I try to load it using
It results in
RuntimeError: Caught exception during model preprocessing: Tokenizer class MolformerTokenizer does not exist or is not currently imported.
This is not surprising, because this model does not use the specificMolformerTokenizer
butAutoTokenizer
instead.However, the documentation says that
"If a text feature's encoder specifies a huggingface model, then the tokenizer for that model will be used automatically."
.How can I load the tokenizer for this model?