Synthetic data example - Githubissues

HuangFuSL commented 1 month ago

Hi there! I am particularly interested in the synthetic data used for experimental evaluations mentioned in the paper, which seems to be abscent from the repository. Could you please provide some details on the synthetic data generation procedure (i.e. code, prompt, or LLM response template)? Additionally, if possible, it would be helpful to have some example records of the generated data for better understanding and potential replication of your experiments.

margotyjx commented 1 month ago

Hi there! I am particularly interested in the synthetic data used for experimental evaluations mentioned in the paper, which seems to be abscent from the repository. Could you please provide some details on the synthetic data generation procedure (i.e. code, prompt, or LLM response template)? Additionally, if possible, it would be helpful to have some example records of the generated data for better understanding and potential replication of your experiments.

Thank you so much for raising this question! We plan to release more details on the synthetic data generation procedure, including the code and prompts, in early November this year — stay tuned!

In the meantime, could you clarify what you mean by "example records of the generated data"? This would greatly assist us in preparing our code for the potential replication of our experiments.

HuangFuSL commented 1 month ago

In the meantime, could you clarify what you mean by "example records of the generated data"? This would greatly assist us in preparing our code for the potential replication of our experiments.

I mean a few GPT-4 conversation examples (i.e. a case study): even if there's no code provided, I believe such cases may help to understand the procedure.

By the way, the dataset in src/data seems to be pre-processed. I wonder which is the original dataset, GPT4books or MSSD?

margotyjx commented 1 month ago

I mean a few GPT-4 conversation examples (i.e. a case study): even if there's no code provided, I believe such cases may help to understand the procedure.

By the way, the dataset in src/data seems to be pre-processed. I wonder which is the original dataset, GPT4books or MSSD?

Thank you very much for your valuable suggestions. We will release the code and hopefully that could clarify the confusion.

The dataset in src/data is the GPT4books data.

HuangFuSL commented 1 month ago

I see, and thanks for your reply.

margotyjx / CSRec_repo

Synthetic data example #1