Closed HuangFuSL closed 1 month ago
Hi there! I am particularly interested in the synthetic data used for experimental evaluations mentioned in the paper, which seems to be abscent from the repository. Could you please provide some details on the synthetic data generation procedure (i.e. code, prompt, or LLM response template)? Additionally, if possible, it would be helpful to have some example records of the generated data for better understanding and potential replication of your experiments.
Thank you so much for raising this question! We plan to release more details on the synthetic data generation procedure, including the code and prompts, in early November this year — stay tuned!
In the meantime, could you clarify what you mean by "example records of the generated data"? This would greatly assist us in preparing our code for the potential replication of our experiments.
In the meantime, could you clarify what you mean by "example records of the generated data"? This would greatly assist us in preparing our code for the potential replication of our experiments.
I mean a few GPT-4 conversation examples (i.e. a case study): even if there's no code provided, I believe such cases may help to understand the procedure.
By the way, the dataset in src/data
seems to be pre-processed. I wonder which is the original dataset, GPT4books or MSSD?
I mean a few GPT-4 conversation examples (i.e. a case study): even if there's no code provided, I believe such cases may help to understand the procedure.
By the way, the dataset in
src/data
seems to be pre-processed. I wonder which is the original dataset, GPT4books or MSSD?
Thank you very much for your valuable suggestions. We will release the code and hopefully that could clarify the confusion.
The dataset in src/data
is the GPT4books data.
I see, and thanks for your reply.
Hi there! I am particularly interested in the synthetic data used for experimental evaluations mentioned in the paper, which seems to be abscent from the repository. Could you please provide some details on the synthetic data generation procedure (i.e. code, prompt, or LLM response template)? Additionally, if possible, it would be helpful to have some example records of the generated data for better understanding and potential replication of your experiments.