tosiyuki / LLaVA-JP

LLaVA-JP is a Japanese VLM trained by LLaVA method
Apache License 2.0
54 stars 11 forks source link

what convert s model-00001.x.safetensors to pytorch_model.bin, tf_model.h5, model.ckpt.index #1

Open wavelet2008 opened 10 months ago

wavelet2008 commented 10 months ago

$ sh scripts/finetune_llm_jp_1.3b_bf.sh llm-jp/llm-jp-1.3b-v1.0

SError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory llm-jp/llm-jp-1.3b-v1.0. ubuntu@ubuntu-X10SRA:~/mml/LLaVAjp$ ls -lt llm-jp/llm-jp-1.3b-v1.0/

-rw-rw-r-- 1 ubuntu ubuntu 4997886200 1月 17 14:46 model-00001-of-00002.safetensors -rw-rw-r-- 1 ubuntu ubuntu 4551 1月 17 14:34 README.md -rw-rw-r-- 1 ubuntu ubuntu 1519 1月 17 14:34 gitattributes -rw-rw-r-- 1 ubuntu ubuntu 1916 1月 17 14:34 tokenizer_config.json -rw-rw-r-- 1 ubuntu ubuntu 3243009 1月 17 14:34 tokenizer.json -rw-rw-r-- 1 ubuntu ubuntu 1200 1月 17 14:34 config.json -rw-rw-r-- 1 ubuntu ubuntu 276 1月 17 14:34 special_tokens_map.json -rw-rw-r-- 1 ubuntu ubuntu 71079 1月 17 14:34 model.safetensors.index.json -rw-rw-r-- 1 ubuntu ubuntu 1316062920 1月 17 14:29 model-00002-of-00002.safetensors loading files from website https://huggingface.co/toshi456/llava-jp-1.3b-v1.0/tree/main

what shoud i do convert s model-00001.x.safetensors to pytorch_model.bin, tf_model.h5, model.ckpt.index

tosiyuki commented 10 months ago

Can you tell me what version of hugigface you are using?

wavelet2008 commented 10 months ago

Can you tell me what version of hugigface you are using?

(minigpt4) ubuntu@ubuntu-X10SRA:~/mml/LLaVAjp$ python demo_llava.py Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.07it/s] toshi456/llm-jp-1.3b-v1.0 Traceback (most recent call last): File "/home/ubuntu/mml/LLaVAjp/demo_llava.py", line 34, in tokenizer = transformers.AutoTokenizer.from_pretrained( File "/home/ubuntu/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 702, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, *kwargs) File "/home/ubuntu/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1841, in from_pretrained return cls._from_pretrained( File "/home/ubuntu/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2004, in _from_pretrained tokenizer = cls(init_inputs, **init_kwargs) File "/home/ubuntu/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 111, in init fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file) Exception: data did not match any variant of untagged enum PyDecoderWrapper at line 177 column 3 (minigpt4) ubuntu@ubuntu-X10SRA:~/mml/LLaVAjp$ pip show transformers Name: transformers Version: 4.31.0 Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow Home-page: https://github.com/huggingface/transformers Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors) Author-email: transformers@huggingface.co License: Apache 2.0 License Location: /home/ubuntu/anaconda3/envs/minigpt4/lib/python3.9/site-packages Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm Required-by: llava, peft, sentence-transformers

wavelet2008 commented 10 months ago

minigpt4) ubuntu@ubuntu-X10SRA:~/mml/LLaVAjp$ python demo_llava.py Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.07it/s] toshi456/llm-jp-1.3b-v1.0 Traceback (most recent call last): File "/home/ubuntu/mml/LLaVAjp/demo_llava.py", line 34, in tokenizer = transformers.AutoTokenizer.from_pretrained( File "/home/ubuntu/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 702, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, *kwargs) File "/home/ubuntu/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1841, in from_pretrained return cls._from_pretrained( File "/home/ubuntu/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2004, in _from_pretrained tokenizer = cls(init_inputs, **init_kwargs) File "/home/ubuntu/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 111, in init fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file) Exception: data did not match any variant of untagged enum PyDecoderWrapper at line 177 column 3

tosiyuki commented 9 months ago

Sorry, I cannot reproduce the error in my environment.Could you please tell me more about your environment? About library and os and python version.