ikergarcia1996 / Easy-Translate

Easy-Translate is a script for translating large text files with a SINGLE COMMAND. Easy-Translate is designed to be as easy as possible for beginners and as seamlesscustomizable and as possible for advanced users.
Apache License 2.0
189 stars 306 forks source link

OSError: It looks like the config file at 'models/pytorch_model.bin' is not a valid JSON file #8

Closed vertikalm closed 1 year ago

vertikalm commented 1 year ago

Hello, Tested with Debian 11/12, cuda 11.7/11.8, different models, different precisions,with and without accel, etc. Other projects based on torch and transformers work well on the same machine.

I have these errors when running the script: `python3 translate.py --sentences_path sample_text/en.txt --output_path sample_text/en2es.translation.m2m100_1.2B.txt --source_lang en --target_lang es --model_name models/pytorch_model.bin Loading model from models/pytorch_model.bin Traceback (most recent call last): File "Easy-Translate/.env/lib/python3.11/site-packages/transformers/configuration_utils.py", line 702, in _get_config_dict config_dict = cls._dict_from_json_file(resolved_config_file) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "Easy-Translate/.env/lib/python3.11/site-packages/transformers/configuration_utils.py", line 793, in _dict_from_json_file text = reader.read() ^^^^^^^^^^^^^ File "", line 322, in decode UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "Easy-Translate/translate.py", line 443, in main( File "Easy-Translate/translate.py", line 115, in main model, tokenizer = load_model_for_inference( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "Easy-Translate/model.py", line 75, in load_model_for_inference config = AutoConfig.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "Easy-Translate/.env/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 983, in from_pretrained config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "Easy-Translate/.env/lib/python3.11/site-packages/transformers/configuration_utils.py", line 617, in get_config_dict config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "Easy-Translate/.env/lib/python3.11/site-packages/transformers/configuration_utils.py", line 705, in _get_config_dict raise EnvironmentError(

OSError: It looks like the config file at 'models/pytorch_model.bin' is not a valid JSON file.`

ikergarcia1996 commented 1 year ago

Hi @vertikalm!

--model_name should be the path to either a HuggingFace hub model or to the directory path of a local model, not a pytorch_model.bin file. Inside the specified directory, you should have the model, the tokenizer, and the config.json file. The model should be stored in the hugginfface format (saved using model.save_pretrained(), tokenizer.save_pretrained() and config.save_pretrained()).

vertikalm commented 1 year ago

Thank you. here a retired dinosaur Pascal/Assembler. Much to learn. I think that until now I have downloaded already quantified or cooked models. I understand. Thanks for sharing this project. Yo tampoco se vivir, estoy improvisando =:-)