poteminr / instruct-ner

Instruct LLMs for flat and nested NER. Fine-tuning Llama and Mistral models for instruction named entity recognition. (Instruction NER)
Apache License 2.0
74 stars 8 forks source link

finetuning #7

Closed naohri closed 7 months ago

naohri commented 7 months ago

can you help me with the ner

poteminr commented 7 months ago

Hi!

You should create instruct dataset (steps from readme).

Example for flat-ner dataset:

https://github.com/poteminr/instruct-ner/tree/main/instruction_ner/utils/rudrec

I think this is a more difficult step than the others.

Provide more information about your problem :)

poteminr commented 7 months ago

You should set "is_adapter": false in your config. Than pass them as a training argument --config_file path_to_config

poteminr commented 7 months ago

Example. You should use config for mistral from my rep

!python /content/medner/instruction_ner/train_instruct.py --config_file /content/medner/instruction_ner/configs/llama2_7b_lora.json --rudrec_path /content/medner/instruction_ner/data/rudrec/rudrec_annotated.json --model_type llama --max_instances 100
poteminr commented 7 months ago

Your train_instructions should consist of list with dicts (like this dict):

{'instruction': 'Ты решаешь задачу NER. Извлеки из текста слова, относящиеся к каждой из следующих сущностей: Drugname, Drugclass, DI, ADR, Finding.', 'input': 'Это старый-добрый Римантадин, только в сиропе.\n', 'output': 'Drugname: Римантадин\nDrugclass: \nDrugform: сиропе\nDI: \nADR: \nFinding: \n', 'source': '### Задание: Ты решаешь задачу NER. Извлеки из текста слова, относящиеся к каждой из следующих сущностей: Drugname, Drugclass, DI, ADR, Finding.\n### Вход: Это старый-добрый Римантадин, только в сиропе.\n### Ответ: ', 'raw_entities': {'Drugname': ['Римантадин'], 'Drugclass': [], 'Drugform': ['сиропе'], 'DI': [], 'ADR': [], 'Finding': []}, 'id': '1_2555494.tsv'}

poteminr commented 7 months ago

I used Tesla A100. Try to switch from "fp16": false, "bf16": true,

to

"fp16": true, "bf16": false,

(mistral config)

poteminr commented 7 months ago

Feel free to open issue again