Open chintu619 opened 1 year ago
@chintu619 Thanks for your kindly feedback.
Yes, we also found this issue on this version of training data, Evol-Instruct would output some repetitive content which have already appeared in the instruction, and WizardLM is like a chatterbox who asks and answers questions himself in a few cases...
We are focusing on improving the Evol-Instruct, and hope to solve the issue in the next version of WizardLM. After that, we would also open the code of Evol-Instruct and welcome for your suggestions about the failed evolution.
@chintu619 Thanks for your kindly feedback.
Yes, we also found this issue on this version of training data, Evol-Instruct would output some repetitive content which have already appeared in the instruction, and WizardLM is like a chatterbox who asks and answers questions himself in a few cases...
We are focusing on improving the Evol-Instruct, and hope to solve the issue in the next version of WizardLM. After that, we would also open the code of Evol-Instruct and welcome for your suggestions about the failed evolution.
Is the 300,000 dataset already existing? Can it be released?
Firstly, thanks for publicly releasing the instruction dataset.
While looking through the dataset, I've noticed several examples where the
output
is already present in theinstruction
with barely any changes. Example (index47821
from the dataset): Instruction:Output:
Indices of some similar samples:
17421, 17926, 24717, 29166, 41705, 61826, 62587
Is this due to a bug during the creation of the dataset? Was the
WizardLM
model trained with such examples?