Closed Ri0S closed 3 years ago
Thread is idle. Closing.
Hi @muggin,
Thank you for releasing the code.
I'm trying to train the model and reproduce the results. As far as I understood reading the paper, you used 8 GPUs where each GPU uses a batch_size = 12. Other hyperparameters: learning_rate = 2e-5, and num_train_epochs = 10.0. Is that correct?
So the command below should train a model that reproduces the results?
python -m torch.distributed.launch \
--nproc_per_node 8 $CODE_PATH/run.py \
--task_name $TASK_NAME \
--do_train \
--do_eval \
--do_lower_case \
--train_from_scratch \
--data_dir $DATA_PATH \
--model_type bert \
--model_name_or_path $MODEL_NAME \
--max_seq_length 512 \
--per_gpu_train_batch_size 12 \
--learning_rate 2e-5 \
--num_train_epochs 10.0 \
--evaluate_during_training \
--eval_all_checkpoints \
--overwrite_cache \
--output_dir $OUTPUT_PATH/$NAME_EXECUTION/
If I want to evaluate the summary generated by my model about cnn dailymail dataset, can I use the factcc model in the README.md directly. Should I set the label of my summary ‘CORRECT’?
Hello, thank you for raising this issue! Could you please provide more details?