remove duplicate - Githubissues

nlpxucan / WizardLM

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath

9.19k stars 712 forks source link

Open Ski-ing opened 1 year ago

Ski-ing commented 1 year ago

Is there any strict operation to remove duplicate data between training data and test set human-eval before training?

PoseidomWong commented 1 year ago

I guess they didn't

ChiYeungLaw commented 1 year ago

We have checked the SFT training set. The HumanEval test set does not leak in it.

PoseidomWong commented 1 year ago

I would like to ask if there are any plans to open source the training data?