zerospeech / benchmarks

A command line tool that helps use the "Zero Ressource Challenge" benchmarks
https://zerospeech.com/toolbox/
GNU General Public License v3.0
8 stars 3 forks source link

Remove abxLS replace it with abxLSPhon #12

Closed nhamilakis closed 1 year ago

nhamilakis commented 1 year ago

Following the updates, we did with Mark, i think the old abxLS module can be completely removed and replaced with the abxLSPhon one since it can do the same evaluation as the old one.

The type of evaluation is decided on the context parameter.

Am i correct in my assumption ? i Would like your feedback @ewan @hallapmark Also, as Ewan said, maybe the benchmark name is not correct any more, should i change it to something else ?

hallapmark commented 1 year ago

Not exactly, but I am also not sure which level of abstraction we are talking about. I think it might help to distinguish between the labels and configurations used in the background in the abx2 code and the labels that are visible to the user of Benchmarks.

Publicly/Benchmark user visible labels (proposal, probably Ewan would be the stakeholder to say what the best names are) and corresponding abx2 configurations :

Context:

All of these will run speaker_mode=all, i.e. there are separate scores for within speaker and across speaker for all these options.

Explanation: I think the source of the sort of taxonomic difficulty is that from the abx code's standpoint, the on-triphone evaluation is also "within context", it is just that what is extracted are triphones along with the context, i.e. preceding and following phoneme, included (abc, where a and c are the fixed context). Whereas in the within-context on-phoneme evaluation, a and c are still the fixed context, but only b is extracted. This difference between the two "within" context options is caused only by a difference in the timestamps in the item file. Literally, the old abx1 code could run on-phoneme within context and on-triphone within context, if you make two calls with corresponding item files. Whereas for the any-context condition code changes were needed.

I do not know how best to add a further layer of categorization and give the Benchmark user the option to run multiple of these at the same time. One option would be:

Or maybe just:

context: all

ewan commented 1 year ago

Yes, the two modules are functionally the same. The general name can be abxLS rather than abxLSPhon, even though we use the updated module.

Indeed, as @hallapmark indicates, there are two (mostly) independent dimensions, in addition to the speaker dimension, which means that the names all and any would be confusing. The first dimension is the items, which are either triphones are phones. The second is the context dimension, which can be either within or any (we sometimes also called this option whatever context, but we have been using). However, as you imply, it doesn't make much sense to run the triphone item file with the any option, therefore, we can do as Mark says and provide four options:

nhamilakis commented 1 year ago

I think i understand better now, i will modify my code accordingly

nhamilakis commented 1 year ago

After running the evaluation, the output in the Benchmark module looks something like this

With context_mode="all", and speaker_mode="all".

subset speaker_mode context_mode granularity score item_file pooling seed
dev-clean within within triphone 0.4932 triphone-dev-clean.item none 3459
dev-clean across within triphone 0.4981 triphone-dev-clean.item none 3459
dev-other within within triphone 0.4985 triphone-dev-other.item none 3459
dev-other across within triphone 0.4998 triphone-dev-other.item none 3459
test-clean within within triphone 0.4970 triphone-test-clean.item none 3459
test-clean across within triphone 0.4996 triphone-test-clean.item none 3459
test-other within within triphone 0.4926 triphone-test-other.item none 3459
test-other across within triphone 0.5010 triphone-test-other.item none 3459
dev-clean within within phoneme 0.5001 phoneme-dev-clean.item none 3459
dev-clean across within phoneme 0.4983 phoneme-dev-clean.item none 3459
dev-other within within phoneme 0.5136 phoneme-dev-other.item none 3459
dev-other across within phoneme 0.4994 phoneme-dev-other.item none 3459
test-clean within within phoneme 0.5178 phoneme-test-clean.item none 3459
test-clean across within phoneme 0.4979 phoneme-test-clean.item none 3459
test-other within within phoneme 0.5000 phoneme-test-other.item none 3459
test-other across within phoneme 0.4990 phoneme-test-other.item none 3459
dev-clean within any phoneme 0.5005 phoneme-dev-clean.item none 3459
dev-clean across any phoneme 0.5004 phoneme-dev-clean.item none 3459
dev-other within any phoneme 0.5044 phoneme-dev-other.item none 3459
dev-other across any phoneme 0.5002 phoneme-dev-other.item none 3459
test-clean within any phoneme 0.4978 phoneme-test-clean.item none 3459
test-clean across any phoneme 0.5000 phoneme-test-clean.item none 3459
test-other within any phoneme 0.5027 phoneme-test-other.item none 3459
test-other across any phoneme 0.4989 phoneme-test-other.item none 3459

As said before, the options for context are : triphone-within,phoneme-within,phoneme-any,all The options for speaker mode are the same as before. I also left the possibility for the user to set the seed via the params.yaml file. The item file & granularity column are a bit redundant, not sure if i will keep them both. Also, it does not mean that all this information is in the final leaderboard we can still choose to show or not, i kept what i think is useful for the user at the level of the Benchmark module. Most of the other parameters are also in the params.yaml file.

If you @ewan & @hallapmark agree with this, i will go ahead and merge the branch.

hallapmark commented 1 year ago

Yes, maybe the item_file column is redundant, granularity should be enough, but it's ok either way. Otherwise, this looks good to me!

ewan commented 1 year ago

This looks fine to me. I would keep the granularity column.


From: Hamilakis Nicolas @.> Sent: January 4, 2023 10:07 AM To: zerospeech/benchmarks @.> Cc: Ewan Dunbar @.>; Mention @.> Subject: Re: [zerospeech/benchmarks] Remove abxLS replace it with abxLSPhon (Issue #12)

After running the evaluation, the output in the Benchmark module looks something like this :

subset speaker_mode context_mode granularity score item_file pooling seed dev-clean within within triphone 0.4932 triphone-dev-clean.item none 3459 dev-clean across within triphone 0.4981 triphone-dev-clean.item none 3459 dev-other within within triphone 0.4985 triphone-dev-other.item none 3459 dev-other across within triphone 0.4998 triphone-dev-other.item none 3459 test-clean within within triphone 0.4970 triphone-test-clean.item none 3459 test-clean across within triphone 0.4996 triphone-test-clean.item none 3459 test-other within within triphone 0.4926 triphone-test-other.item none 3459 test-other across within triphone 0.5010 triphone-test-other.item none 3459 dev-clean within within phoneme 0.5001 phoneme-dev-clean.item none 3459 dev-clean across within phoneme 0.4983 phoneme-dev-clean.item none 3459 dev-other within within phoneme 0.5136 phoneme-dev-other.item none 3459 dev-other across within phoneme 0.4994 phoneme-dev-other.item none 3459 test-clean within within phoneme 0.5178 phoneme-test-clean.item none 3459 test-clean across within phoneme 0.4979 phoneme-test-clean.item none 3459 test-other within within phoneme 0.5000 phoneme-test-other.item none 3459 test-other across within phoneme 0.4990 phoneme-test-other.item none 3459 dev-clean within any phoneme 0.5005 phoneme-dev-clean.item none 3459 dev-clean across any phoneme 0.5004 phoneme-dev-clean.item none 3459 dev-other within any phoneme 0.5044 phoneme-dev-other.item none 3459 dev-other across any phoneme 0.5002 phoneme-dev-other.item none 3459 test-clean within any phoneme 0.4978 phoneme-test-clean.item none 3459 test-clean across any phoneme 0.5000 phoneme-test-clean.item none 3459 test-other within any phoneme 0.5027 phoneme-test-other.item none 3459 test-other across any phoneme 0.4989 phoneme-test-other.item none 3459

— Reply to this email directly, view it on GitHubhttps://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fzerospeech%2Fbenchmarks%2Fissues%2F12%23issuecomment-1371046360&data=05%7C01%7Cewan.dunbar%40utoronto.ca%7C51b507fb2ba447d04b1808daee656039%7C78aac2262f034b4d9037b46d56c55210%7C0%7C0%7C638084416408069169%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=S6bl937i9VXx4OCxaSqCoDcXrlukuvCYpkT8pmglyIQ%3D&reserved=0, or unsubscribehttps://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAA4DUMTZ54JRX7A4OEGDM3WQWG2PANCNFSM6AAAAAATPPXBUI&data=05%7C01%7Cewan.dunbar%40utoronto.ca%7C51b507fb2ba447d04b1808daee656039%7C78aac2262f034b4d9037b46d56c55210%7C0%7C0%7C638084416408069169%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=AwJwxdnO4FL47AooVRzgsGedNwIZK7n1FR0jsocEsuk%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>