tosiyuki / LLaVA-JP

LLaVA-JP is a Japanese VLM trained by LLaVA method
Apache License 2.0
52 stars 10 forks source link

what convert s model-00001.x.safetensors to pytorch_model.bin, tf_model.h5, model.ckpt.index #1

Open wavelet2008 opened 8 months ago

wavelet2008 commented 8 months ago

$ sh scripts/finetune_llm_jp_1.3b_bf.sh llm-jp/llm-jp-1.3b-v1.0

SError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory llm-jp/llm-jp-1.3b-v1.0. ubuntu@ubuntu-X10SRA:~/mml/LLaVAjp$ ls -lt llm-jp/llm-jp-1.3b-v1.0/

-rw-rw-r-- 1 ubuntu ubuntu 4997886200 1月 17 14:46 model-00001-of-00002.safetensors -rw-rw-r-- 1 ubuntu ubuntu 4551 1月 17 14:34 README.md -rw-rw-r-- 1 ubuntu ubuntu 1519 1月 17 14:34 gitattributes -rw-rw-r-- 1 ubuntu ubuntu 1916 1月 17 14:34 tokenizer_config.json -rw-rw-r-- 1 ubuntu ubuntu 3243009 1月 17 14:34 tokenizer.json -rw-rw-r-- 1 ubuntu ubuntu 1200 1月 17 14:34 config.json -rw-rw-r-- 1 ubuntu ubuntu 276 1月 17 14:34 special_tokens_map.json -rw-rw-r-- 1 ubuntu ubuntu 71079 1月 17 14:34 model.safetensors.index.json -rw-rw-r-- 1 ubuntu ubuntu 1316062920 1月 17 14:29 model-00002-of-00002.safetensors loading files from website https://huggingface.co/toshi456/llava-jp-1.3b-v1.0/tree/main

what shoud i do convert s model-00001.x.safetensors to pytorch_model.bin, tf_model.h5, model.ckpt.index

tosiyuki commented 8 months ago

Can you tell me what version of hugigface you are using?

wavelet2008 commented 8 months ago

Can you tell me what version of hugigface you are using?

(minigpt4) ubuntu@ubuntu-X10SRA:~/mml/LLaVAjp$ python demo_llava.py Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.07it/s] toshi456/llm-jp-1.3b-v1.0 Traceback (most recent call last): File "/home/ubuntu/mml/LLaVAjp/demo_llava.py", line 34, in tokenizer = transformers.AutoTokenizer.from_pretrained( File "/home/ubuntu/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 702, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, *kwargs) File "/home/ubuntu/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1841, in from_pretrained return cls._from_pretrained( File "/home/ubuntu/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2004, in _from_pretrained tokenizer = cls(init_inputs, **init_kwargs) File "/home/ubuntu/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 111, in init fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file) Exception: data did not match any variant of untagged enum PyDecoderWrapper at line 177 column 3 (minigpt4) ubuntu@ubuntu-X10SRA:~/mml/LLaVAjp$ pip show transformers Name: transformers Version: 4.31.0 Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow Home-page: https://github.com/huggingface/transformers Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors) Author-email: transformers@huggingface.co License: Apache 2.0 License Location: /home/ubuntu/anaconda3/envs/minigpt4/lib/python3.9/site-packages Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm Required-by: llava, peft, sentence-transformers

wavelet2008 commented 8 months ago

minigpt4) ubuntu@ubuntu-X10SRA:~/mml/LLaVAjp$ python demo_llava.py Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.07it/s] toshi456/llm-jp-1.3b-v1.0 Traceback (most recent call last): File "/home/ubuntu/mml/LLaVAjp/demo_llava.py", line 34, in tokenizer = transformers.AutoTokenizer.from_pretrained( File "/home/ubuntu/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 702, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, *kwargs) File "/home/ubuntu/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1841, in from_pretrained return cls._from_pretrained( File "/home/ubuntu/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2004, in _from_pretrained tokenizer = cls(init_inputs, **init_kwargs) File "/home/ubuntu/anaconda3/envs/minigpt4/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 111, in init fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file) Exception: data did not match any variant of untagged enum PyDecoderWrapper at line 177 column 3

tosiyuki commented 8 months ago

Sorry, I cannot reproduce the error in my environment.Could you please tell me more about your environment? About library and os and python version.