Closed ewan closed 11 months ago
The datasets:import
function was only supposed to work with the 2015 dataset, we have since decided to make that benchmark read-only and i forgot to remove it from the code.
I have never had a timeout during download, but i could repurpose the import command to allow importing directly a .zip
file in case the download fails/timeouts, so in that case users can download the zip directly from the URL and use the import command to install it.
The MD5 check-fail was an error on my part as i had updated the content of the .zip archive without updating the MD5 check, this has been fixed.
Although the datasets:pull
command has an option to skip, MD5 checks -u / --skip-verification
.
I tried skipping the MD5 check, but I can´t be sure whether that solved the issue, as the dataset had another issue that kept me from using it. However, in any case, most of the time, the script crashed with a network error, rather than getting through to the MD5 check. Thus, yes, it is worth adding an option to import the dataset from a .zip in case the download option doesn't work.
From: Hamilakis Nicolas @.> Sent: July 17, 2023 11:12 AM To: zerospeech/benchmarks @.> Cc: Ewan Dunbar @.>; Author @.> Subject: Re: [zerospeech/benchmarks] Dataset downloads often timeout (Issue #32)
The datasets:import function was only supposed to work with the 2015 dataset, we have since decided to make that benchmark read-only and i forgot to remove it from the code.
I have never had a timeout during download, but i could repurpose the import command to allow importing directly a .zip file in case the download fails/timeouts, so in that case users can download the zip directly from the URL and use the import command to install it.
The MD5 check-fail was an error on my part as i had updated the content of the .zip archive without updating the MD5 check, this has been fixed.
Although the datasets:pull command has an option to skip, MD5 checks -u / --skip-verification.
— Reply to this email directly, view it on GitHubhttps://github.com/zerospeech/benchmarks/issues/32#issuecomment-1638351396, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAA4DULLWOECTZC5TGREJ7TXQVI6LANCNFSM6AAAAAA2KVALUY. You are receiving this because you authored the thread.Message ID: @.***>
Added import for the other download functions : 0a80ddb
The dataset installation often cannot finish on large downloads due to network issues. Specifically the following:
typically terminates before the download has finished. This results in one of two errors. Either there is an exception of the following kind:
or else the code passes to the MD5Sum check and fails due to the incomplete download.
There should be either a more robust download code which can resume partly completed downloads, or, failing this, a straightforward way to install datasets from offline downloads.
The dataset:import command, which seems like it could do the latter, does not currently work. In addition to printing a warning about being untested, it crashes with an exception when trying to run the following command (where the last argument is the name of a directory to which zrc2017-test-dataset.zip was extracted):
Here is the traceback: