zerospeech / benchmarks

A command line tool that helps use the "Zero Ressource Challenge" benchmarks
https://zerospeech.com/toolbox/
GNU General Public License v3.0
9 stars 3 forks source link

Issue with dataset downloads #39

Closed nhamilakis closed 4 months ago

nhamilakis commented 4 months ago

Some weird (TLS, timeout) errors have been appearing when trying to download datasets.

nhamilakis commented 4 months ago

When trying to download abxLS-dataset (https://download.zerospeech.com/datasets/abxLS.datasets.zip) :

Using curl :

Getting Error 18 or partial download errors (56).

Using wget

Closed connection Error (bug wget manages to resume

Using requests:

Connection Timeout

nhamilakis commented 4 months ago

Add the ability to use multiple download backends :

1) curl (subprocess & pycurl) 2) wget 3) requests

Add an environment variable (DL_BACKEND) to allow easy switching and avoid cluttering of CLI, as user shouldn't need to know of how files are downloaded.

Perform some tests and see which one works best and set it as default (Hint: if wget has an auto retry, it should be the default)