yineza7 / Summarization-of-a-stack-of-papers-using-LLMs-

Summarizing a stack of papers involves systematically reviewing and condensing the key information from each paper to provide a concise overview It might be feasible to have two stages to come out the summarization of a stack of papers.
MIT License
0 stars 0 forks source link

finding more datasets #2

Open yineza7 opened 8 months ago

yineza7 commented 8 months ago

sticking to a particular genre (i.e, news article)

Obasjoe commented 8 months ago

Attached to this comment is a list of datasets that I have found so far, the majority of them archive-1.zip are news articles but there are also outliers like Reddit comments and reviews from retail stores. Also, i will attach the files to 2 different comments since the size of the datasets so far is too large for attachment reddit-comments.zip reviews.zip walmart reviews.zip walmart_reviews_2.zip

Obasjoe commented 8 months ago

BBC-news-summary.zip

yineza7 commented 7 months ago

@Obasjoe Could you add the link to the location of the data for easier direct access?

yineza7 commented 7 months ago
ColinThomas1 commented 7 months ago

https://www.kaggle.com/datasets/jpmiller/layoutlm medical dataset, says it can do text summarization, 30GB download

ColinThomas1 commented 7 months ago

https://paperswithcode.com/dataset/massivetext Another potential dataset, looks like it can be terabytes of data, careful when downloading.

ColinThomas1 commented 7 months ago

https://www.kaggle.com/datasets/thedevastator/pubmed-article-summarization-dataset Not sure if the previous two were summarization datasets, this one should be. Based off of medical data.

yineza7 commented 7 months ago

Reasearch articles: https://paperswithcode.com/dataset/arxiv-summarization-dataset