关于大模型instructIE的使用

66246764 commented 7 months ago

Describe the question

A clear and concise description of what the question is. 您好，因为我是刚入门，想要请教的问题可能比较初级，还请不吝赐教。我正在尝试使用您项目中llm中的instructIE的Readme指南进行学习，在下图处有3.2【基础模型】和4【微调模型】，我是两种都挑选一个下载下来吗？

Screenshots

If applicable, add screenshots to help explain your problem.

66246764 commented 7 months ago

抱歉图片好像无法上传 🐐 3.2模型以下是本仓库代码支持的一些基础模型：[llama, alpaca, vicuna, zhixi, falcon, baichuan, chatglm, qwen, moss, openba]

🌰 4.LoRA微调下面是一些已经经过充分信息抽取指令数据训练的模型：

zjunlp/llama2-13b-iepile-lora （底座模型是LLaMA2-13B-Chat） zjunlp/baichuan2-13b-iepile-lora （底座模型是BaiChuan2-13B-Chat） zjunlp/knowlm-ie-v2

guihonghao commented 7 months ago

如果面向场景是中文建议选择zjunlp/baichuan2-13b-iepile-lora，如果是英文建议选择zjunlp/llama2-13b-iepile-lora，此外你还需要下载对应的基础大模型BaiChuan2-13B-Chat或LLaMA2-13B-Chat

66246764 commented 7 months ago

感谢！之前我有学习过您团队关于knowLM的视频，里面提到了权重还原，使用（LLaMA-13B）+（zhixi-13B-Diff），再加zhixi-13B-lora。这里是不是也需要做一个类似的权重还原呢，需要去找到这三个文件做对应的操作吗？还有关于您提到的基础大模型“BaiChuan2-13B-Chat或LLaMA2-13B-Chat”，“-chat”和“-base”的基座大模型有区别吗

guihonghao commented 7 months ago

不需要权重还原！！直接到官网https://huggingface.co/baichuan-inc、https://huggingface.co/meta-llama下载模型，和lora权重参数。设置参数--model_name_or_path models/Baichuan2-13B-Chat、--checkpoint_dir lora/baichuan2-13b-iepile-lora即可。

guihonghao commented 7 months ago

-chat经过指令微调，-base没有。-chat需要使用特定的模版即--template参数，Baichuan2是baichuan2，LLaMA2是llama2。建议使用-chat版本。

66246764 commented 7 months ago

明白了，谢谢您的解答，非常感谢！

66246764 commented 7 months ago

您好，再请教一个问题，如果我想将这两个模型运用到垂直领域的数据上，是配置好环境和下载模型后，直接按照4.9的参数配置进行吗，只需将垂直领域的数据按格式替换data里面的train.json、dev.json等文件就可以了吗？ output_dir='lora/llama2-13b-chat-v1-continue' mkdir -p ${output_dir} CUDA_VISIBLE_DEVICES="0,1,2,3" torchrun --nproc_per_node=4 --master_port=1287 src/finetune.py \ --do_train --do_eval \ --overwrite_output_dir \ --model_name_or_path 'models/llama2-13B-Chat' \ --checkpoint_dir 'lora/llama2-13b-iepile-lora' \ --stage 'sft' \ --model_name 'llama' \ --template 'llama2' \ --train_file 'data/train.json' \ --valid_file 'data/dev.json' \ --output_dir=${output_dir} \ --per_device_train_batch_size 2 \ --per_device_eval_batch_size 2 \ --gradient_accumulation_steps 4 \ --preprocessing_num_workers 16 \ --num_train_epochs 10 \ --learning_rate 5e-5 \ --max_grad_norm 0.5 \ --optim "adamw_torch" \ --max_source_length 400 \ --cutoff_len 700 \ --max_target_length 300 \ --evaluation_strategy "epoch" \ --save_strategy "epoch" \ --save_total_limit 10 \ --lora_r 64 \ --lora_alpha 64 \ --lora_dropout 0.05 \ --bf16 \ --bits 4

66246764 commented 7 months ago

以及运行这部分代码处理data就可以了吗 python ie2instruction/convert_func.py

guihonghao commented 7 months ago

是的，首先需要转换成sample.json文件中的统一格式，指定schema文件。具体参考各个任务目录下的文件形式。再通过命令python ie2instruction/convert_func.py \ --src_path data/RE/sample.json \ --tgt_path data/RE/train.json \ --schema_path data/RE/schema.json \ --language zh \ --task RE \ --split_num 4 \
--random_sort \ --split train转换成train.json、dev.json文件

66246764 commented 7 months ago

感谢您的指导！

66246764 commented 6 months ago

您好，在您的帮助下，我成功使用垂直领域的数据，在baichuan2-13b-iepile-lora模型的基础上进行了二次训练，并获得了一定的领域数据NER、RE的能力。对于这个学习了知识图谱三元组的大模型，能否直接拥有开放问答的能力呢？比如我有一个三元组{“坦克”，“包含”，“武器系统”}这样的三元组被模型学习，我可否直接进行对大模型以问答的形式直接问：坦克包含什么？并返回“武器系统”这样的能力吗？在https://huggingface.co/zjunlp/baichuan2-13b-iepile-lora提到是只能使用固定模板提问的形式进行使用吗？ instruction_mapper = { 'NERzh': "你是专门进行实体抽取的专家。请从input中抽取出符合schema定义的实体，不存在的实体类型返回空列表。请按照JSON字符串的格式回答。", 'REzh': "你是专门进行关系抽取的专家。请从input中抽取出符合schema定义的关系三元组，不存在的关系返回空列表。请按照JSON字符串的格式回答。", }

guihonghao commented 6 months ago

抽取任务是给定了文本，从文本中抽取出相关的片段，不会生成文本以外的片段。开放问答会生成给定文本（问题）以外的东西。目前的抽取系统，应该无法做开放问答的任务。如果要做开放问答，需要在训练的时候给通用语料，以及开放问答的相关数据。

66246764 commented 6 months ago

明白了，谢谢！

zjunlp / DeepKE

关于大模型instructIE的使用 #427