Closed rianrajagede closed 4 months ago
After we trained the longformer with knowledge graph (keptlongformer), we further finetuned keptlongformer on the MIMIC-III 50 dataset.
To finetune and eval on MIMIC-III 50 (2 A100 GPU) :
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node 2 --master_port 57666 run_coder.py \
--ddp_find_unused_parameters False \
--disable_tqdm True \
--version mimic3-50 --model_name_or_path whaleloops/keptlongformer \
--do_train --do_eval --max_seq_length 8192 \
--per_device_train_batch_size 1 --per_device_eval_batch_size 2 \
--learning_rate 1.5e-5 --weight_decay 1e-3 --adam_epsilon 1e-7 --num_train_epochs 8 \
--evaluation_strategy epoch --save_strategy epoch \
--logging_first_step --global_attention_strides 1 \
--output_dir ./saved_models/longformer-original-clinical-prompt2alpha
I see.. after reading your comment and re-reading the paper I just realized I missed that, so the published keptlongformer is not designed (trained) for MIMIC III-50, so I need to finetune it first before eval it.
Thank you for the response!
Hi, I tried to compare the F1 micro result between the model published in HF and the one I need to download from GDrive.
I tried to run the eval step to the MIMIC 50 dataset with the same parameters at the readme. The result from the GDrive model is:
f1_micro:0.728503038114521
Then, I changed the model path to the HF model, and the result dropped a lot:
f1_micro:0.2161833132150996
Are there any settings that I need to do to get a fair comparison for both model results?