zerospeech / benchmarks

A command line tool that helps use the "Zero Ressource Challenge" benchmarks
https://zerospeech.com/toolbox/
GNU General Public License v3.0
8 stars 2 forks source link

zrc2017-test-dataset contains incorrect index file #33

Closed ewan closed 11 months ago

ewan commented 11 months ago

Attempting to do an evaluation of a zrc2017 submission as follows,

zrc benchmarks:run abx17 ~/zr-data/samples/abx17-random-submission/

results in validation errors caused by the index.json in zrc2017-test-dataset. The file_type fields contain dots (e.g., "file_type": "wav", "file_type": ".wrd", and so on). The initial . needs to be stripped out, and, in addition. the .vad.csv files should just be of type csv .

nhamilakis commented 11 months ago

This should be solved by downloading the new version of the dataset.

ewan commented 11 months ago

Do we have a repo for the dataset? If so I can make a PR as the fix is trivial.


From: Hamilakis Nicolas @.> Sent: July 17, 2023 11:19 AM To: zerospeech/benchmarks @.> Cc: Ewan Dunbar @.>; Author @.> Subject: Re: [zerospeech/benchmarks] zrc2017-test-dataset contains incorrect index file (Issue #33)

Closed #33https://github.com/zerospeech/benchmarks/issues/33 as completed.

— Reply to this email directly, view it on GitHubhttps://github.com/zerospeech/benchmarks/issues/33#event-9839170869, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAA4DUJ2GMA2Y5MORL4SOJDXQVJZXANCNFSM6AAAAAA2KVF2ZI. You are receiving this because you authored the thread.Message ID: @.***>

nhamilakis commented 11 months ago

The fix is already online, i downloaded the dataset on Friday for a TDE17 evaluation, and it worked for me. Try the following :

The import option will be added at some point this week to solve timeout errors (#32).

ewan commented 11 months ago

I downloaded the file from download.zerospeech.com just now, and the index.json is still as before. Attached here.

index.json.txt

nhamilakis commented 11 months ago

There should not be any validation issues with the current index, even if the files are as they are, could you send me the pydantic version installed pip list | grep pydantic.

ewan commented 11 months ago

pydantic 1.9.0 pydantic_core 2.1.2


From: Hamilakis Nicolas @.> Sent: July 20, 2023 6:16 AM To: zerospeech/benchmarks @.> Cc: Ewan Dunbar @.>; Author @.> Subject: Re: [zerospeech/benchmarks] zrc2017-test-dataset contains incorrect index file (Issue #33)

There should not be any validation issues with the current index, even if the files are as they are, could you send me the pydantic version installed pip list | grep pydantic.

— Reply to this email directly, view it on GitHubhttps://github.com/zerospeech/benchmarks/issues/33#issuecomment-1643655544, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAA4DUJBDBDRKRVLWKSXSZTXREAPDANCNFSM6AAAAAA2KVF2ZI. You are receiving this because you authored the thread.Message ID: @.***>

nhamilakis commented 11 months ago

Do you still have the validation issues after updating the package and downloading the new version of the dataset ?

pydantic 1.9.0 pydantic_core 2.1.2

You should uninstall pydantic_corepip uninstall pydantic_core as it was probably installed by your previous installation. The version of pydantic i have is 1.10 (although 1.9 should work as well). I added a strict version rule to prevent other versions from being installed, so that should not be a problem in future installs.

As for the checksum issue on import/pull, have you tried doing a zrc reset-index ? If yes, could you share the ~/zr-data/repo.jsonfile so i can check if we have the same info.

ewan commented 11 months ago

After reinstalling the package and the dataset, the pull command works with no checksum issues, and the validation and evaluation runs fine.


From: Hamilakis Nicolas @.> Sent: July 24, 2023 7:18 AM To: zerospeech/benchmarks @.> Cc: Ewan Dunbar @.>; Author @.> Subject: Re: [zerospeech/benchmarks] zrc2017-test-dataset contains incorrect index file (Issue #33)

Do you still have the validation issues after updating the package and downloading the new version of the dataset ?

pydantic 1.9.0 pydantic_core 2.1.2

You should uninstall pydantic_core ⇒ pip uninstall pydantic_core as it was probably installed by your previous installation. The version of pydantic i have is 1.10 (although 1.9 should work as well). I added a strict version rule to prevent other versions from being installed, so that should not be a problem in future installs.

As for the checksum issue on import/pull, have you tried doing a zrc reset-index ? If yes, could you share the ~/zr-data/repo.jsonfile so i can check if we have the same info.

— Reply to this email directly, view it on GitHubhttps://github.com/zerospeech/benchmarks/issues/33#issuecomment-1647716317, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAA4DUP77NVZ3O2JH7BD5YLXRZKXDANCNFSM6AAAAAA2KVF2ZI. You are receiving this because you authored the thread.Message ID: @.***>

nhamilakis commented 11 months ago

I will mark this issue as resolved if that is the case.