about wizard of wikipedia dataset

shrimai / Focused-Attention-Improves-Document-Grounded-Generation

MIT License

21 stars 4 forks source link

about wizard of wikipedia dataset #2

Open julielin123 opened 3 years ago

julielin123 commented 3 years ago

Hi, I am interested in the experimental results on wow dataset based on DoHA model, and I want to run the model on wow. I want to know if there is any trained model on wow dataset? Or do you use a universal pre-trained model for wow task? If so, what is it? I`m eager to hear from you, thanks!

shrimai commented 3 years ago

You would have to train your own model on the wow dataset using the DOHA train script provided in the code.

Reveyer commented 3 years ago

@shrimai Hi, I am also interested in WOW. Would you like to release the code of how to process WOW data?

julielin123 commented 3 years ago

Do you use all candidate knowledge sentences as di of each sample on wow dataset？ How do you use label?

shrimai commented 3 years ago

Yes I concatenate all the sentences of all the retrieved passages (approx 7 passages for each sample) as d_i of each sample. I don't make any use of label.

@Reveyer if you make a tsv of chat_context, retrieved_passages, and target_response then the processing for the model is already provided at https://github.com/shrimai/Focused-Attention-Improves-Document-Grounded-Generation/blob/2eb7951af35a09e3d652ba6918a0d2fd4e6f8a14/doha.py#L233

cathyxl commented 2 years ago

Hello Shrimai, I'm also interested in the processing of wow dataset. I found that the concatenated retrieved passage could exceed max source length 900. how did you solve the problem?