project-baize / baize-chatbot

Let ChatGPT teach your own chatbot in hours with a single GPU!
https://arxiv.org/abs/2304.01196
GNU General Public License v3.0
3.15k stars 275 forks source link

How collect other topic data? #19

Open yfq512 opened 1 year ago

yfq512 commented 1 year ago

How collect other topic data? such as "china history" topic, how to modify the code, in detail. Thanks!

JetRunner commented 1 year ago

Change the following lines to add your own data: https://github.com/project-baize/baize-chatbot/blob/6790946f638d60fcaf397574189124f15792f35a/collect.py#L17-L41

yfq512 commented 1 year ago

@JetRunner Thanks for your reply! I have trained follow your instructions,

python finetune.py 7b 16 0.0002 quora

and I get some files:

checkpoints/
└── 7b
    ├── adapter_config.json
    ├── adapter_model.bin
    ├── checkpoint-200
    │   ├── optimizer.pt
    │   ├── pytorch_model.bin
    │   ├── rng_state.pth
    │   ├── scaler.pt
    │   ├── scheduler.pt
    │   ├── trainer_state.json
    │   └── training_args.bin
    ├── checkpoint-400
     ……

Now, how to load local models, when I run demo/app.py?

guoday commented 1 year ago

You need to cp checkpoints/7b/checkpoint-200/pytorch_model.bin checkpoints/7b/adapter_model.bin and the lora_model path set as ../checkpoints/7b

yfq512 commented 1 year ago

@guoday Thanks for you reply!

python app.py decapoda-research/llama-7b-hf project-baize/baize-lora-7B

change to

python app.py decapoda-research/llama-7b-hf ../checkpoints/7b/

Is that so?

guoday commented 1 year ago

yes.

yfq512 commented 1 year ago

@guoday @JetRunner If I want to collect other topic such as "中国历史", the questions list in code need to edit myself? It's too hard to create tens of thousands of questions artificially😭.

https://github.com/project-baize/baize-chatbot/blob/6790946f638d60fcaf397574189124f15792f35a/collect.py#L57-L58

guoday commented 1 year ago

You don't have to collect questions. Some entities like "唐朝" or "李白" also work well.