plageon / HtmlRAG

HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieval Results in RAG Systems
MIT License
194 stars 12 forks source link

ELI5 dataset #1

Closed ChantalMP closed 1 week ago

ChantalMP commented 2 weeks ago

Hi :)

I wanted to ask if you have access to the supporting documents in the ELI5 dataset and if you could provide some information on where to find them.

Thanks and best regards,

Chantal

plageon commented 2 weeks ago

We will release the retrieved HTML from Bing soon.

plageon commented 1 week ago

The training and test data are now available in the huggingface dataset HtmlRAG-train and HtmlRAG-test. By the way, full ELI5 dataset can be accessed at FlashRAG_datasets. Hope this solves your concern. I close this issue for now.