Open binchen4110 opened 8 months ago
Hi, the training data is first temporally ordered, i.e. from the ealierst to the latest timestamp. Then we select the first 5%, 10%, ... 75%, etc. In the setting presented in our paper, the partition is done before TLR and FIT. Note that in the TLR phase, for each partition we build prompts with TLR and uniformly sample 1024 prompts from this x% of the original training set for FIT training in GenTKG. Other baseline models are trained on x% of the original training set.
In 4.3 "In-domain Generalization", How do you design your various partitions of training data? Are this step in phrase of TLR or FIT or both them?