lexibank / lexitools

Apache License 2.0
0 stars 0 forks source link

Sound corresp: Changes to the coarsening #16

Closed XachaB closed 3 years ago

XachaB commented 3 years ago

This pull request concerns changes to the coarsening, both in the configuration and how it is processed, to allow for more complex rules which target and change more than one feature at once, or which entirely remove features.

LinguList commented 3 years ago

We have is_valid_sound in pyclts: from pyclts.models import is_valid_sound. I think it is useful to test for coarsened sounds if they are accepted by pyclts. Maybe I missed that, but I did not find this particular test.

https://github.com/cldf-clts/pyclts/blob/d64409259b4d7caae643cebe8f034007103422e3/src/pyclts/models.py#L39-L45

XachaB commented 3 years ago

Thanks, I didn't now of it and was relying on checking for "UnknownSound"s. I'll use this in the future -- I might not have the time to do more refactoring right away, though.

XachaB commented 3 years ago

Actually, it looks like we can't use is_valid_sound directly with the way I implemented the Coarse class, as it returns strings and only uses clts "under the hood".

I imagine that if you used something similar to implement a clts-internal coarsening system, then the requirements would be different, and returning Sound isntances instead would be important.

LinguList commented 3 years ago

Well, the back-forth-conversion is still an important test feature, that I'd recommend: parse the sound as CLTS, after you have made it coarse, check if it translates to a sound-string that is then accepted as a sound.

For practical checking, I'd also recommend to provide a list of all mappings that we find in the data, as it would be interesting to check how much these coarsened sounds differ in the end of the sound classes in different versions.

But one could do this also in separation, as a specific task, and I'd love to test this (I hope I will have some time sooner), since I find these coarsening procedures very important for the future.

XachaB commented 3 years ago

Noted too, I see better what you meant. Note that the correspondence program does export a complete list of coarse sounds as one of the result files, see the attached file for example (renamed in .txt because github does not accept .csv files). 20210217-14h50m11s_sound_correspondences_coarsening.csv.txt

XachaB commented 3 years ago

I have a few more changes coming, one on the alignment algorithm, one on picking labels for the coarse sets, so I'm keeping this open a little bit more, to avoid merging too many times.