zrc2017-test-dataset contains incorrect index file

ewan commented 11 months ago

Attempting to do an evaluation of a zrc2017 submission as follows,

zrc benchmarks:run abx17 ~/zr-data/samples/abx17-random-submission/

results in validation errors caused by the index.json in zrc2017-test-dataset. The file_type fields contain dots (e.g., "file_type": "wav", "file_type": ".wrd", and so on). The initial . needs to be stripped out, and, in addition. the .vad.csv files should just be of type csv .

nhamilakis commented 11 months ago

This should be solved by downloading the new version of the dataset.

ewan commented 11 months ago

Do we have a repo for the dataset? If so I can make a PR as the fix is trivial.

From: Hamilakis Nicolas @.> Sent: July 17, 2023 11:19 AM To: zerospeech/benchmarks @.> Cc: Ewan Dunbar @.>; Author @.> Subject: Re: [zerospeech/benchmarks] zrc2017-test-dataset contains incorrect index file (Issue #33)

Closed #33https://github.com/zerospeech/benchmarks/issues/33 as completed.

— Reply to this email directly, view it on GitHubhttps://github.com/zerospeech/benchmarks/issues/33#event-9839170869, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAA4DUJ2GMA2Y5MORL4SOJDXQVJZXANCNFSM6AAAAAA2KVF2ZI. You are receiving this because you authored the thread.Message ID: @.***>

nhamilakis commented 11 months ago

The fix is already online, i downloaded the dataset on Friday for a TDE17 evaluation, and it worked for me. Try the following :

zrc datasets:rm zrc2017-test-dataset
zrc reset-index (although that should happen automatically, you can force it with this option)
zrc dataset:pull zrc2017-test-dataset (re-download the dataset)

The import option will be added at some point this week to solve timeout errors (#32).

ewan commented 11 months ago

I downloaded the file from download.zerospeech.com just now, and the index.json is still as before. Attached here.

index.json.txt

nhamilakis commented 11 months ago

There should not be any validation issues with the current index, even if the files are as they are, could you send me the pydantic version installed pip list | grep pydantic.

ewan commented 11 months ago

pydantic 1.9.0 pydantic_core 2.1.2

From: Hamilakis Nicolas @.> Sent: July 20, 2023 6:16 AM To: zerospeech/benchmarks @.> Cc: Ewan Dunbar @.>; Author @.> Subject: Re: [zerospeech/benchmarks] zrc2017-test-dataset contains incorrect index file (Issue #33)

There should not be any validation issues with the current index, even if the files are as they are, could you send me the pydantic version installed pip list | grep pydantic.

— Reply to this email directly, view it on GitHubhttps://github.com/zerospeech/benchmarks/issues/33#issuecomment-1643655544, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAA4DUJBDBDRKRVLWKSXSZTXREAPDANCNFSM6AAAAAA2KVF2ZI. You are receiving this because you authored the thread.Message ID: @.***>

nhamilakis commented 11 months ago

Do you still have the validation issues after updating the package and downloading the new version of the dataset ?

pydantic 1.9.0 pydantic_core 2.1.2

You should uninstall pydantic_core ⇒ pip uninstall pydantic_core as it was probably installed by your previous installation. The version of pydantic i have is 1.10 (although 1.9 should work as well). I added a strict version rule to prevent other versions from being installed, so that should not be a problem in future installs.

As for the checksum issue on import/pull, have you tried doing a zrc reset-index ? If yes, could you share the ~/zr-data/repo.jsonfile so i can check if we have the same info.

ewan commented 11 months ago

After reinstalling the package and the dataset, the pull command works with no checksum issues, and the validation and evaluation runs fine.

From: Hamilakis Nicolas @.> Sent: July 24, 2023 7:18 AM To: zerospeech/benchmarks @.> Cc: Ewan Dunbar @.>; Author @.> Subject: Re: [zerospeech/benchmarks] zrc2017-test-dataset contains incorrect index file (Issue #33)

Do you still have the validation issues after updating the package and downloading the new version of the dataset ?

pydantic 1.9.0 pydantic_core 2.1.2

You should uninstall pydantic_core ⇒ pip uninstall pydantic_core as it was probably installed by your previous installation. The version of pydantic i have is 1.10 (although 1.9 should work as well). I added a strict version rule to prevent other versions from being installed, so that should not be a problem in future installs.

As for the checksum issue on import/pull, have you tried doing a zrc reset-index ? If yes, could you share the ~/zr-data/repo.jsonfile so i can check if we have the same info.

— Reply to this email directly, view it on GitHubhttps://github.com/zerospeech/benchmarks/issues/33#issuecomment-1647716317, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAA4DUP77NVZ3O2JH7BD5YLXRZKXDANCNFSM6AAAAAA2KVF2ZI. You are receiving this because you authored the thread.Message ID: @.***>

nhamilakis commented 11 months ago

I will mark this issue as resolved if that is the case.

zerospeech / benchmarks

zrc2017-test-dataset contains incorrect index file #33