Open jowagner opened 2 years ago
Similarly to issue #85, we should investigate how much the noise is from randomness in the selection (and ordering) of training data in continued pre-training.
Similarly to issue #85, we should investigate how much the noise is from randomness in the selection (and ordering) of training data in continued pre-training.