Open thomwolf opened 3 years ago
Pinging @mariamabarham as well
nlp
has MR! It's called rotten_tomatoes
nlp
also has ag_news
, a popular news classification datasetI'd also like to see:
Thanks @jxmorris12 for pointing this out.
In glue we only have SST-2 maybe we can add separately SST-1.
This is the homepage for the Amazon dataset: https://www.kaggle.com/datafiniti/consumer-reviews-of-amazon-products
Is there an easy way to download kaggle datasets programmatically? If so, I can add this one!
Hi @jxmorris12 for now I think our dl_manager
does not download from Kaggle.
@thomwolf , @lhoestq
Pretty sure the quora dataset is the same one I implemented here: https://github.com/huggingface/nlp/pull/366
Great list. Any idea if Amazon Reviews has been added?
Apologies if it's been included (great to see where) and if not, it's one of the better medium/large NLP dataset for semi-supervised learning, albeit a bit out of date.
Thanks!!
cc @sshleifer
On the Amazon Reviews dataset, the original UCSD website has noted these are now updated to include product reviews through 2018 -- actually quite recent compared to many other datasets. Almost certainly the largest NLP dataset out there with labels! https://jmcauley.ucsd.edu/data/amazon/
Any chance someone has time to onboard this dataset in a HF way?
cc @sshleifer
@albertvillanova How up to date is this issue? I see that some of these datasets are now on huggingface but have not been checked off the list
We are missing a few datasets for Text Classification which is an important field.
Namely, it would be really nice to add:
386
1315
1934
791
1389
410
450
471
1116
424
366
All these datasets are cited in https://arxiv.org/abs/2004.03705