huggingface / setfit

Efficient few-shot learning with Sentence Transformers
https://hf.co/docs/setfit
Apache License 2.0
2.25k stars 223 forks source link

HFValidationError when loading the model #94

Open jrivd opened 2 years ago

jrivd commented 2 years ago

Hi there! I am trying to load a model I have stored at Google Drive for inferencing:

# Load SetFit model
tuned_model = SetFitModel.from_pretrained("/content/drive/My Drive/models/tuned-model")
# Run inference
tuned_model(["i didnt feel humiliated", "i feel romantic too", "im grabbing a minute to post i feel greedy wrong"])

But I get the following error:

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/content/drive/My Drive/models /tuned-model'. Use repo_type argument if needed.

it works however when I load it from the same script in which I have saved it, by:

# Save trained model to disk
trainer.model.save_pretrained("/content/drive/My Drive/models/tuned-model")

What can be the problem? Can't just I save/load from pretrained to Google Drive?

Many thanks in advance for your support and terrific work.

lewtun commented 2 years ago

Hey @jrivd can you provide a code snippet / Colab that reproduces the error? This will help debug what exactly is going on :)

jrivd commented 2 years ago

Hi @lewtun, thanks for your response. You can see the error in action here: https://colab.research.google.com/drive/10t9QmQEe7BHIQQ8XUmw1B37vPFRfORDK?usp=sharing Many thanks!

pdhall99 commented 2 years ago

I get the same error. When I try to load a locally saved model:

from setfit import SetFitModel

model = SetFitModel.from_pretrained("/path/to/model-directory", local_files_only=True)

I get

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/path/to/model-directory'. Use `repo_type` argument if needed.

I think this could be solved by changing these lines from

        if os.path.isdir(model_id) and MODEL_HEAD_NAME in os.listdir(model_id):
            model_head_file = os.path.join(model_id, MODEL_HEAD_NAME)

to something like

        if os.path.isdir(model_id):
            if MODEL_HEAD_NAME in os.listdir(model_id):
                model_head_file = os.path.join(model_id, MODEL_HEAD_NAME)
            else:
                model_head_file = None
Mouhanedg56 commented 2 years ago

I saved a trained model on local path. I can't see anything wrong when loading the model using from_pretrained with correct path.

This error appears when you try to load a model from a nonexistent local path which have more than 1 backslash \ with local_files_only=True.

@pdhall99 's suggestion looks fixing the issue and follows the same logic in the ModelHubMixin.from_pretrained:

jrivd commented 2 years ago

Many thanks for your comments, @pdhall99 and @Mouhanedg56! I'll go over them carefully

sfernandes-gim commented 2 years ago

Hey @pdhall99 , was this issue finally resolved? I am trying to load the ST models offline but still get the repo_name' or 'namespace/repo_name' error when using full path . i can only load models offline in my environment

When the model file is located in same directory still unable to load as below. model = SetFitModel.from_pretrained("./all-MiniLM-L6-v2",model_head_file=None,local_filesonly=True). i tried with multiple arguments but getting the error Repo id must use alphanumeric chars or '-', '', '.', '--' and '..' are forbidden, '-' and '.' cannot start Thks

pdhall99 commented 2 years ago

Hi @sfernandes-gim, this change is not yet merged.

lewtun commented 2 years ago

Hi folks, we've just released a new version that include fixes to some of the above issues. For those still having troubles, could you please comment below with a code snippet for debugging? Thanks!

sfernandes-gim commented 2 years ago

Hi @lewtun , trying to load from my local directory but always gives below error.Also tried with local_files_only=True option .Please advise

model = SetFitModel.from_pretrained("./all-MiniLM-L6-v2")

Error HFValidationError: Repo id must use alphanumeric chars or '-', '', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: './all-MiniLM-L6-v2'._

Tried below odel = SetFitModel.from_pretrained("./Output/all-MiniLM-L6-v2",local_files_only=True,use_differentiable_head=True, head_params={"out_features": num_classes})

Error _HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': './Output/all-MiniLM-L6-v2'. Use repo_type argument if needed._

sfernandes-gim commented 2 years ago

Thanks @lewtun this works fine with the latest release

kryptec commented 1 year ago

I recently ran into this problem, and after reading the above comments resolved it by giving the absolute path to the folder instead of the relative one.

I'm not sure if this is a hugging face issue, or the cluster environment I'm working on, but I thought I would mention it here in case it helps anyone.

lamtrinh259 commented 1 year ago

@kryptec thank you so much, I tried your method and gave it the absolute path, and apparently, it works now!

LazerJesus commented 1 year ago

I still get this error:

peftmodelpath = "/notebooks/eva/model.bin"

model = PeftModelForCausalLM.from_pretrained(
    model, 
    peftmodelpath, 
    cache_dir=peftmodelpath, 
    local_files_only=True, 
    model_head_file=None
)

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/notebooks/eva/model.bin'. Userepo_typeargument if needed.

ayseozgun commented 1 year ago

Hello,

I am trying to read the hf model directly from s3 on sagemaker studio. I am also getting same 'HFValidationError' error. I am putting my code below:

`from transformers import T5Tokenizer, T5ForConditionalGeneration

Specify the S3 URL to your model and tokenizer

model_url = "s3://bucketname/model/"

Load the model and tokenizer from S3

tokenizer = T5Tokenizer.from_pretrained(model_url) model = T5ForConditionalGeneration.from_pretrained(model_url)

Now you can use the model and tokenizer for inference

input_text = "translate English to German: How old are you?" input_ids = tokenizer(input_text, return_tensors="pt", truncation=True, padding=True, max_length=512) input_ids.to("cuda")

outputs = model.generate(input_ids) print(tokenizer.decode(outputs[0]))`

I am able to see the model by running below code on sagemaker, so i am sure the path is correct.

`s3 = boto3.client('s3')

List all objects in the model folder from S3

s3_resource = boto3.resource('s3') my_bucket = s3_resource.Bucket(bucket_name) for object_summary in my_bucket.objects.filter(Prefix=''): file_path = object_summary.key file_name = os.path.basename(file_path) if file_name: print(file_name)`

Can you please help me? Thanks :)

bojanbabic commented 11 months ago

In the Colab this happens if you mount drive. For some reason mounted path is not recognized. Instead, try having model in /content. This should solve issue of missing path.

u1vi commented 9 months ago

I had the same issue in the local machine (not colab). I was using mounted drive (pure storage).

Creating a symlink to a folder (saving directory) to the working directory (where the training is going on) solved the issue for me.

ln -s existing_source_file optional_symbolic_link

Rakin061 commented 7 months ago

In the Colab this happens if you mount drive. For some reason mounted path is not recognized. Instead, try having model in /content. This should solve issue of missing path.

Not working event loading model from /content. Have you solve this issue from your side ? @bojanbabic

zheyaf commented 5 months ago

I'm download the s3 files to tempfile, the temporary folder is not recognized as well

reouvenzana commented 5 months ago

I encountered the same error when working on my local machine. I solved it by using an absolute path to my model directory.

gleb-t commented 3 weeks ago

Ran into the same issue on 1.1.0 when providing an absolute path. Downgrading to 0.7.0 helped resolve the issue.