levmichael / saphon

South American Phonological Inventory
4 stars 0 forks source link

Parsing language names #72

Open rsprouse opened 8 months ago

rsprouse commented 8 months ago

In the Tupian nasal typology input spreadsheet some values for Language contain strings with the language name and ISO 639-3 values, e.g. Araweté [awt], which is parsed as the name and iso_codes values in the .json data format.

Other languages have a third part of the string. What is the intended meaning of the third part and how should it be parsed? Examples: Avá-Canoeiro Goiás [avv-gos] (avá-canoeiro goiás) and Avá-Canoeiro [avv-tct] (avá-canoeiro tocantins).

rsprouse commented 8 months ago

There are yet other patterns, e.g. Chiriguano (ava dialect), ISO: gui. The name and iso_codes fields should probably be checked and corrected by hand.

levmichael commented 8 months ago

The languages that have a third part to the string are languages that require a dialect specification. More generally I agree with the point that there needs to a bunch of hand correction done. What would probably be ideal, actually, is to switch over to glottocodes, and get Harald Hammarstöm to add dialects for us. @mlapier What do you think?

mlapier commented 8 months ago

Yes, I agree. This is exactly why we were thinking of switching them all to Glottocodes.

Myriam Lapierre Assistant Professor Department of Linguistics University of Washington

I recognize that the University of Washington stands on the lands and shared waters of the Coast Salish Peoples; Duwamish, Puyallup, Suquamish, Tulalip and Muckleshoot nations.

On Thu, Feb 22, 2024 at 11:43 AM levmichael @.***> wrote:

The languages that have a third part to the string are languages that require a dialect specification. More generally I agree with the point that there needs to a bunch of hand correction done. What would probably be ideal, actually, is to switch over to glottocodes, and get Harald Hammarstöm to add dialects for us. @mlapier https://urldefense.com/v3/__https://github.com/mlapier__;!!K-Hz7m0Vt54!i_jdbWNkE-W5w-g7JDoqW8MWJ6jqRLSaOTdxFeg6nlKlA0dlRoCIYkVK6Rs-BuolPCGF3zNVAQJZshEE_SDu5A9_$ What do you think?

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/levmichael/saphon/issues/72*issuecomment-1960136262__;Iw!!K-Hz7m0Vt54!i_jdbWNkE-W5w-g7JDoqW8MWJ6jqRLSaOTdxFeg6nlKlA0dlRoCIYkVK6Rs-BuolPCGF3zNVAQJZshEE_TSDbAAu$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/A4YYS2INJWMM2IBMJDISKX3YU6NVPAVCNFSM6AAAAABDTZKFCCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRQGEZTMMRWGI__;!!K-Hz7m0Vt54!i_jdbWNkE-W5w-g7JDoqW8MWJ6jqRLSaOTdxFeg6nlKlA0dlRoCIYkVK6Rs-BuolPCGF3zNVAQJZshEE_WZ5oETL$ . You are receiving this because you were mentioned.Message ID: @.***>