Closed ewan closed 11 months ago
This should be solved by downloading the new version of the dataset.
Do we have a repo for the dataset? If so I can make a PR as the fix is trivial.
From: Hamilakis Nicolas @.> Sent: July 17, 2023 11:19 AM To: zerospeech/benchmarks @.> Cc: Ewan Dunbar @.>; Author @.> Subject: Re: [zerospeech/benchmarks] zrc2017-test-dataset contains incorrect index file (Issue #33)
Closed #33https://github.com/zerospeech/benchmarks/issues/33 as completed.
— Reply to this email directly, view it on GitHubhttps://github.com/zerospeech/benchmarks/issues/33#event-9839170869, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAA4DUJ2GMA2Y5MORL4SOJDXQVJZXANCNFSM6AAAAAA2KVF2ZI. You are receiving this because you authored the thread.Message ID: @.***>
The fix is already online, i downloaded the dataset on Friday for a TDE17 evaluation, and it worked for me. Try the following :
zrc datasets:rm zrc2017-test-dataset
zrc reset-index
(although that should happen automatically, you can force it with this option)zrc dataset:pull zrc2017-test-dataset
(re-download the dataset)The import option will be added at some point this week to solve timeout errors (#32).
I downloaded the file from download.zerospeech.com just now, and the index.json is still as before. Attached here.
There should not be any validation issues with the current index, even if the files are as they are, could you send me the pydantic version installed pip list | grep pydantic
.
pydantic 1.9.0 pydantic_core 2.1.2
From: Hamilakis Nicolas @.> Sent: July 20, 2023 6:16 AM To: zerospeech/benchmarks @.> Cc: Ewan Dunbar @.>; Author @.> Subject: Re: [zerospeech/benchmarks] zrc2017-test-dataset contains incorrect index file (Issue #33)
There should not be any validation issues with the current index, even if the files are as they are, could you send me the pydantic version installed pip list | grep pydantic.
— Reply to this email directly, view it on GitHubhttps://github.com/zerospeech/benchmarks/issues/33#issuecomment-1643655544, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAA4DUJBDBDRKRVLWKSXSZTXREAPDANCNFSM6AAAAAA2KVF2ZI. You are receiving this because you authored the thread.Message ID: @.***>
Do you still have the validation issues after updating the package and downloading the new version of the dataset ?
pydantic 1.9.0 pydantic_core 2.1.2
You should uninstall pydantic_core
⇒ pip uninstall pydantic_core
as it was probably installed by your previous installation.
The version of pydantic i have is 1.10 (although 1.9 should work as well).
I added a strict version rule to prevent other versions from being installed, so that should not be a problem in future installs.
As for the checksum issue on import/pull, have you tried doing a zrc reset-index
? If yes, could you share the ~/zr-data/repo.json
file so i can check if we have the same info.
After reinstalling the package and the dataset, the pull command works with no checksum issues, and the validation and evaluation runs fine.
From: Hamilakis Nicolas @.> Sent: July 24, 2023 7:18 AM To: zerospeech/benchmarks @.> Cc: Ewan Dunbar @.>; Author @.> Subject: Re: [zerospeech/benchmarks] zrc2017-test-dataset contains incorrect index file (Issue #33)
Do you still have the validation issues after updating the package and downloading the new version of the dataset ?
pydantic 1.9.0 pydantic_core 2.1.2
You should uninstall pydantic_core ⇒ pip uninstall pydantic_core as it was probably installed by your previous installation. The version of pydantic i have is 1.10 (although 1.9 should work as well). I added a strict version rule to prevent other versions from being installed, so that should not be a problem in future installs.
As for the checksum issue on import/pull, have you tried doing a zrc reset-index ? If yes, could you share the ~/zr-data/repo.jsonfile so i can check if we have the same info.
— Reply to this email directly, view it on GitHubhttps://github.com/zerospeech/benchmarks/issues/33#issuecomment-1647716317, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAA4DUP77NVZ3O2JH7BD5YLXRZKXDANCNFSM6AAAAAA2KVF2ZI. You are receiving this because you authored the thread.Message ID: @.***>
I will mark this issue as resolved if that is the case.
Attempting to do an evaluation of a zrc2017 submission as follows,
results in validation errors caused by the index.json in zrc2017-test-dataset. The file_type fields contain dots (e.g., "file_type": "wav", "file_type": ".wrd", and so on). The initial . needs to be stripped out, and, in addition. the .vad.csv files should just be of type csv .