mlpack / benchmarks

Machine Learning Benchmark Scripts
101 stars 49 forks source link

Download wine and wine_qual dataset #121

Closed zoq closed 6 years ago

zoq commented 6 years ago

Fix for https://github.com/mlpack/benchmarks/issues/118.

kno10 commented 6 years ago

I was considering to suggest the swap, but if you assume that someone deletes the resulting file and expects the scripts to re-download it, this will not work. That is why I suggested to use just wine.csv without a wildcard.

zoq commented 6 years ago

Good point, will remove the wildcard.

rcurtin commented 6 years ago

@kno10: thanks for pointing out the issue in #118. I was thinking about this earlier, the only problem is that wine.tar.gz unpacks multiple files:

$ tar -tvf wine.tar.gz
-rw-r--r-- marcus/staff  15151 2017-10-10 13:24 wine.arff
-rw-r--r-- marcus/staff  10765 2017-10-10 13:24 wine.csv
-rw-r--r-- marcus/staff    741 2017-10-10 13:24 wine_centroids.csv

Checking for wine.csv is fine, I guess, but there are other files also. To do this "correctly" we'd have to re-adapt the script, but I'm not sure if it's worth the effort. I'm fine with this patch here, I just wanted to point out the possible extra complexity.

zoq commented 6 years ago

An easy solution would be to list every file and remove the asterix entirely, but not sure this is necessary.

zoq commented 6 years ago

@mlpack-jenkins test this please

rcurtin commented 6 years ago

@mlpack-jenkins test this please