Inquiry about 128-shot learning

Richard-LZ-Zhang commented 1 year ago

Hi, my name is Richard Zhang, a researcher based in the Engineering Department, University of Cambridge. Your method of curating a CiteSumm dataset for pretraining, and then achieving SoTA by a few-shot is truly amazing! I have two questions. First, you did 128-shot learning on CITES to evaluate the SciTLDR dataset. How did you fit 128 examples into the context window (1024, I believe) of BART? My second question is, when you pre-train on CiteSum, did you train for just one epoch? Any comment is appreciated!

morningmoni commented 1 year ago

Hi Richard, thanks for your interest.

It is not in-context but conventional seq2seq. 2. You can follow the training script for training details.

Richard-LZ-Zhang commented 1 year ago

Thank you for your response. 1. Do you mean few-shot learning as fine-tuning on a few examples? 2. I could only see per_device_train_batch_size=8 and max_steps = 2e5. I can not derive how many epochs you trained for. Can you provide more information?

morningmoni commented 1 year ago

yes
It's using earlystopping on the validation set

morningmoni / CiteSum

Inquiry about 128-shot learning #2