luogen1996 / LaVIN

[NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"
507 stars 37 forks source link

Regarding the tokenizer.model in /data/weights dir #36

Open wanboyang opened 11 months ago

wanboyang commented 11 months ago

Thanks for this amazing work. I have a question about tokenizer.model in /data/weights dir, the tokenizer.model is not provide in this project.

LaVIN/
  |-- lavin
  |-- scripts
  |-- train.py
  |-- eval.py
  ......
data/
  |-- problem.json
  |-- pid_splits.json
  |-- captions.json
  |-- all_data.json
  |-- images
      |-- train2014      # MSCOCO 2014
      |-- val2014        # MSCOCO 2014
      |-- train          # ScienceQA train image
      |-- val            # ScienceQA val image
      |-- test           # ScienceQA test image
  |-- weights
      |-- tokenizer.model
          |--7B
              |-- params.json
              |-- consolidated.00.pth
          |--13B
              |-- params.json
              |-- consolidated.00.pth
              |-- consolidated.01.pth
          |--vicuna_7B
          |--vicuna_13B
              |-- config.json
              |-- generation_config.json
              |-- pytorch_model.bin.index.json
              |-- special_tokens_map.json
              |-- tokenizer_config.json
              |-- tokenizer.model
              |-- pytorch_model-00001-of-00003.bin
              |-- pytorch_model-00002-of-00003.bin
              |-- pytorch_model-00003-of-00003.bin
          ......
dillonalaird commented 9 months ago

You can download the tokenizer.model from one of the provided links, for example this is the one used for LLaMA-7B https://huggingface.co/nyanko7/LLaMA-7B/tree/main

That file tree is not correct, tokenzier.model is a file not a folder. All of the folders beneath it, 7B, 13B, etc. should be underneath the weights/ folder.