uci-ml-repo / ucimlrepo-feedback

0 stars 1 forks source link

Currently Missing Datasets #40

Open ap0nia opened 1 year ago

ap0nia commented 1 year ago

A list of dataset files we believe are missing. Will be updated as they're reported / found. Feel free to comment to report additional ones.

rkost commented 1 year ago

Dataset 148 currently only links to a zip archive with no data but one empty folder called Graphics. The downloaded archive is only 116 bytes in size.

ap0nia commented 1 year ago

Thank you for informing us. This should be rectified now. I can find 4 files in the downloaded zip file. Please let us know if this isn't the case for you, thanks!

Link to the dataset page: https://archive.ics.uci.edu/dataset/148/statlog+shuttle

rkost commented 1 year ago

Hey, thanks so much for the fast response! Yes, all 4 files are there.

I wonder if it is intentional that the training data is still compressed after unzip-ing the downloaded archive while the test data is not? One can get the original data by running uncompress shuttle.trn.Z on any unix, not sure about windows users.

Edit. Ah, just saw that the index file also lists the training data as a compressed file, disregard then :)

markellekelly commented 1 year ago

Same issue with Census Income (#20)—the zip only contains a "Graphics" folder

ptruong0 commented 1 year ago

Hi Markelle, the abstract of the Census Income dataset says that it is the same as the Adult dataset. We can either copy the Adult files to the Census Income dataset, or remove Census Income altogether. How should we handle this?

markellekelly commented 1 year ago

Since this dataset is well-known under both names, let's have the data available under both for now (i.e., go ahead and copy the Adult files)—we can discuss combining the two later. thanks!

maxxu05 commented 1 year ago

Dataset 341 is also missing: https://archive.ics.uci.edu/dataset/341/smartphone+based+recognition+of+human+activities+and+postural+transitions

ptruong0 commented 1 year ago

@maxxu05 Fixed, thanks for letting us know.

jundsp commented 1 year ago

There is missing data from Dataset 301 "Parkinson Speech Dataset with Multiple Types of Sound Recordings": https://archive.ics.uci.edu/dataset/301/parkinson+speech+dataset+with+multiple+types+of+sound+recordings

It used to include a .rar file that contained the audio files (~20 mb). But not only includes a couple of text files. For example, this snapshot from 2015 shows the full dataset: https://web.archive.org/web/20150208025709/http://archive.ics.uci.edu/ml/machine-learning-databases/00301/

Wamadahama commented 1 year ago

Dataset 28 - Japanese Credit Screening at https://archive.ics.uci.edu/dataset/28/japanese+credit+screening appears to be missing the dataset, the download contains only an empty Graphics folder.

AhmedGHDev commented 1 year ago

Dataset 84 [Prodigy] currently only links to a zip archive with no data but one empty folder called Graphics.

AhmedGHDev commented 1 year ago

Dataset 157 [Dodgers Loop Sensor] currently only links to a zip archive with no data but one empty folder called Graphics with two images (the images for the dataset).

just to mention that the file https://archive.ics.uci.edu/static/public/156/calit2+building+people+counts.zip contains 6 files. I think two of them belong to [Dodgers Loop Sensor] dataset, which are:

AhmedGHDev commented 1 year ago

Dataset 75 [Musk (Version 2)] currently only links to a zip archive with no data but one empty folder called Graphics.

just to mention that the file https://archive.ics.uci.edu/static/public/74/musk+version+1.zip contains 7 files. I think three of them belong to [Musk (Version 2] dataset, which are:

AhmedGHDev commented 1 year ago

Dataset 91 [Soybean (Small)] currently only links to a zip archive with no data but one empty folder called Graphics.

just to mention that the file https://archive.ics.uci.edu/static/public/90/soybean+large.zip contains 12 files. I think two of them belong to [Soybean (Small)] dataset, which are:

AhmedGHDev commented 1 year ago

Dataset 96 [SPECTF Heart] currently only links to a zip archive with no data but one empty folder called Graphics.

just to mention that the file https://archive.ics.uci.edu/static/public/95/spect+heart.zip contains 8 files. I think two of them belong to [SPECTF Heart] dataset, which are:

AhmedGHDev commented 1 year ago

missing on your side :

https://archive.ics.uci.edu/static/public/143/statlog+australian+credit+approval.zip

https://archive.ics.uci.edu/static/public/145/statlog+heart.zip

https://archive.ics.uci.edu/static/public/146/statlog+landsat+satellite.zip

https://archive.ics.uci.edu/static/public/149/statlog+vehicle+silhouettes.zip

https://archive.ics.uci.edu/static/public/100/teaching+assistant+evaluation.zip

https://archive.ics.uci.edu/static/public/150/connectionist+bench+nettalk+corpus.zip

https://archive.ics.uci.edu/static/public/152/connectionist+bench+vowel+recognition+deterding+data.zip

https://archive.ics.uci.edu/static/public/154/protein+data.zip

https://archive.ics.uci.edu/static/public/155/cloud.zip

AhmedGHDev commented 1 year ago

Another question please, The website currently contains 657 datasets, but the dataset ID reaches 892 Is there private datasets?

ptruong0 commented 1 year ago

When datasets are donated, they have to be approved by admins. There are currently 657 approved datasets, and 892 datasets in total including pending & rejected datasets.

mirfan83 commented 1 year ago

Hello, The datasset 613, Smartphone Dataset for Anomaly Detection in Crowds is also missing.

Thanks.

rlongjohn commented 1 year ago

Also missing: Connectionist Bench (Sonar, Mines vs. Rocks)

markellekelly commented 11 months ago

We used to have the PIMA Indians dataset (many other websites, e.g., Kaggle attribute it to us), not sure what happened to it

ptruong0 commented 7 months ago

@markellekelly The owners of the PIMA dataset replaced the files with a note.txt that says "Thank you for your interest in the Pima Indians Diabetes dataset. The dataset is no longer available due to permission restrictions."

evelina-crypto commented 1 month ago

i also cannot access my dataset and get "DatasetNotFoundError: Error reading data csv file for "Cirrhosis Patient Survival Prediction" dataset (id=878)."