zerospeech / benchmarks

A command line tool that helps use the "Zero Ressource Challenge" benchmarks
https://zerospeech.com/toolbox/
GNU General Public License v3.0
8 stars 2 forks source link

sLM21 dataset installation fails #11

Closed Saurabhbhati closed 1 year ago

Saurabhbhati commented 1 year ago

When I run zrc datasets:pull sLM21-dataset I get the following error

Download completed Successfully!

MD5sum Failed, Check with repository administrator. Exiting...

How do I resolve this? Also, is there a way to use already downloaded datasets with the toolkit?

nhamilakis commented 1 year ago

I ran the command locally on my end and the verification worked without issue. You can check in your $APP_DIR/repo.json that the md5 value for slm21 dataset is 713dd4c2c4ab266c846cb2804c8d9e12. If not, delete the file repo.json and re-run the command (it will be re-downloaded automatically).

To note that there are still some bugs in the tool as it is in Beta version, i will be working this week on fixing most of them, so i thank you for your patience.

I have also added an option to skip the md5 verification (--skip-verification) in the pull command, it will be added on the new version.

Saurabhbhati commented 1 year ago

I was able to download the dataset from the Zerospeech website. The downloaded dataset has the correct md5sum and the toolkit works fine.

As for the question "Also, is there a way to use already downloaded datasets with the toolkit?" I think just adding the index.json and gold.csv files to the respective folder in the already downloaded datasets should work fine. This way you don't need to download the dataset again to use the current toolkit. All the necessary files can be found in the attached zip.

nhamilakis commented 1 year ago

Yes, of course as long as the all the files are added correctly you can add the dataset manually, that should work as well. Although, you still have to download the archive to get the gold & index files.

I will mark this issue as resolved, then. I made some updates fixing bugs that prevented running the abx metrics, you should update your local version as well, i will make a new release in the coming days.

If you find any other problem, don't hesitate to open an issue.

Saurabhbhati commented 1 year ago

That's why I added just the gold and index files in the zip file so you don't have to download entire dataset just to get gold and index files. Thank you for your time and help. I'll update the local version.

nhamilakis commented 1 year ago

Oh, i did not notice the download link. Okay, maybe that could be useful.