huggingface / setfit

Efficient few-shot learning with Sentence Transformers
https://hf.co/docs/setfit
Apache License 2.0
2.24k stars 223 forks source link

trust_remote_code not passed in properly #487

Open bwkchee opened 9 months ago

bwkchee commented 9 months ago

The flag requires trust_remote_code=True doesn't seem to be passed by SetFitModel.from_pretrained properly. The following code results in an error:

from setfit import SetFitModel

model_id = './nomic-embed-text-v1'                                
model = SetFitModel.from_pretrained(model_id, labels=['negative','positive'], trust_remote_code=True)

Results in:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 5
      3 #config = AutoConfig.from_pretrained(model_id,trust_remote_code=True)
      4 model_id = './nomic-embed-text-v1'                                
----> 5 model = SetFitModel.from_pretrained(model_id, labels=['negative','positive'], trust_remote_code=True)

File ~/Library/Python/3.11/lib/python/site-packages/huggingface_hub/utils/_validators.py:118, in validate_hf_hub_args.<locals>._inner_fn(*args, **kwargs)
    115 if check_use_auth_token:
    116     kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.__name__, has_token=has_token, kwargs=kwargs)
--> 118 return fn(*args, **kwargs)

File ~/Library/Python/3.11/lib/python/site-packages/huggingface_hub/hub_mixin.py:157, in ModelHubMixin.from_pretrained(cls, pretrained_model_name_or_path, force_download, resume_download, proxies, token, cache_dir, local_files_only, revision, **model_kwargs)
    154         config = json.load(f)
    155     model_kwargs.update({"config": config})
--> 157 return cls._from_pretrained(
    158     model_id=str(model_id),
    159     revision=revision,
    160     cache_dir=cache_dir,
    161     force_download=force_download,
    162     proxies=proxies,
    163     resume_download=resume_download,
    164     local_files_only=local_files_only,
    165     token=token,
    166     **model_kwargs,
...
    624         " set the option `trust_remote_code=True` to remove this error."
    625     )
    627 return trust_remote_code

ValueError: Loading ./nomic-embed-text-v1 requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.

However if I use the huggingface automodel class it appears to load the model correctly:

from transformers import AutoModel

model_id = './nomic-embed-text-v1'   
AutoModel.from_pretrained(model_id, trust_remote_code=True)

<All keys matched successfully>
NomicBertModel(
  (embeddings): NomicBertEmbeddings(
    (word_embeddings): Embedding(30528, 768)
    (token_type_embeddings): Embedding(2, 768)
  )
  (emb_drop): Dropout(p=0.0, inplace=False)
  (emb_ln): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
  (encoder): NomicBertEncoder(
    (layers): ModuleList(
      (0-11): 12 x NomicBertBlock(
        (attn): NomicBertAttention(
          (rotary_emb): NomicBertRotaryEmbedding()
          (Wqkv): Linear(in_features=768, out_features=2304, bias=False)
          (out_proj): Linear(in_features=768, out_features=768, bias=False)
          (drop): Dropout(p=0.0, inplace=False)
        )
        (mlp): NomciBertGatedMLP(
          (fc11): Linear(in_features=768, out_features=3072, bias=False)
          (fc12): Linear(in_features=768, out_features=3072, bias=False)
          (fc2): Linear(in_features=3072, out_features=768, bias=False)
        )
        (dropout1): Dropout(p=0.0, inplace=False)
        (norm1): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (norm2): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout2): Dropout(p=0.0, inplace=False)
      )
    )
  )
)