morningmoni / CiteSum

Dataset, models, and code for paper "CiteSum: Citation Text-guided Scientific Extreme Summarization and Low-resource Domain Adaptation", EMNLP 2022
33 stars 1 forks source link

Inquiry about 128-shot learning #2

Open Richard-LZ-Zhang opened 1 year ago

Richard-LZ-Zhang commented 1 year ago

Hi, my name is Richard Zhang, a researcher based in the Engineering Department, University of Cambridge. Your method of curating a CiteSumm dataset for pretraining, and then achieving SoTA by a few-shot is truly amazing! I have two questions. First, you did 128-shot learning on CITES to evaluate the SciTLDR dataset. How did you fit 128 examples into the context window (1024, I believe) of BART? My second question is, when you pre-train on CiteSum, did you train for just one epoch? Any comment is appreciated!

morningmoni commented 1 year ago

Hi Richard, thanks for your interest.

  1. It is not in-context but conventional seq2seq. 2. You can follow the training script for training details.
Richard-LZ-Zhang commented 1 year ago

Thank you for your response. 1. Do you mean few-shot learning as fine-tuning on a few examples? 2. I could only see per_device_train_batch_size=8 and max_steps = 2e5. I can not derive how many epochs you trained for. Can you provide more information?

morningmoni commented 1 year ago
  1. yes
  2. It's using earlystopping on the validation set