ievapudz / TemStaPro

TemStaPro - a program for protein thermostability prediction using sequence representations from a protein language model.
MIT License
46 stars 9 forks source link

Error occurred, when I use local downloaded model. #3

Closed sanekun closed 9 months ago

sanekun commented 1 year ago

When i use local model downloaded in huggingface, Two errors occurred. And i sovled them.

1. In prottrans_model.py load_model_and_tokenizer tokenizer always is getting from pt_server_path

i changed code like below.

    if(os.path.isfile(f"{pt_dir}/pytorch_model.bin")):
        model = get_pretrained_model(pt_dir)
        tokenizer = get_tokenizer(pt_dir)
    else:
        if(not os.path.exists(f"{pt_dir}/")):
            os.system(f"mkdir -p {pt_dir}/")
        model = get_pretrained_model(pt_server_path)
        save_pretrained_model(model, pt_dir)

        tokenizer = get_tokenizer(pt_server_path)

    return (model, tokenizer)
  1. prottrans_model.py get_tokenizer

new version of T5Tokenizer want to get path. Maybe tokenizer_config.json is in.

I change code like below

    if(os.path.exists(model_path+'/pytorch_model.bin') and 
        os.path.exists(model_path+'/config.json')):
        tokenizer = T5Tokenizer.from_pretrained(model_path)

        #tokenizer = T5Tokenizer.from_pretrained(model_path+'/pytorch_model.bin',
    #    config=model_path+'/config.json', do_lower_case=False)
    else:
        tokenizer = T5Tokenizer.from_pretrained(model_path, do_lower_case=False)

    return tokenizer

i used conda env made by environment_CPU.yml

ievapudz commented 1 year ago

Hello, thank you for pointing this out and suggestions!

Unfortunately, I could not use your code snippet and avoid errors. However, I found another solution that worked for me and ended up with the version that should also do what you requested.

Please, check out the latest version of the program (at least since the commit 07e1464).