xlang-ai / instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Apache License 2.0
1.88k stars 135 forks source link

KeyError: 'task_id' #21

Closed sherlcok314159 closed 1 year ago

sherlcok314159 commented 1 year ago

Hi! I am trying to train instructor-embedding and come up with the error shown in the title. More specifically,

Traceback (most recent call last):
  File "train.py", line 577, in <module>
    main()
  File "train.py", line 450, in main
    print(f'one batch in task {old_train_examples_raw[idx1]["task_id"]} is skipped')
KeyError: 'task_id'

I have downloaded data and put them right in the cache_dir. And here is my running script:

# train the model
model_name=hkunlp/instructor-base
sentence_model_name=sentence-transformers/gtr-t5-base
output_dir=outputs
data_dir=medi-data

python train.py \
    --model_name_or_path=${sentence_model_name} \
    --output_dir=${output_dir} \
    --cache_dir=${data_dir} \
    --max_source_length=512 \
    --num_train_epochs=10 \
    --save_steps=500 \
    --cl_temperature=0.01 \
    --warmup_ratio=0.1 \
    --learning_rate=2e-5 \
    --overwrite_output_dir
Harry-hash commented 1 year ago

Hi, Thanks a lot for your interest in INSTRUCTOR!

We fixed the typo in train.py. Please try again!

sherlcok314159 commented 1 year ago

Thanks for your quick reply. It works.