lexibank / northperulex

Creative Commons Attribution 4.0 International
0 stars 0 forks source link

Orthography fixes #19

Closed FredericBlum closed 3 months ago

FredericBlum commented 4 months ago

Copy&pasting from Mattermost:

MuffinLinwist commented 3 months ago

Copy&pasting from Mattermost:

  • /j̃/ en Arabela, que no deberiamos poner como /j ~/ sino verificar que vaina es

According to Rich (1999) (the source for the data in Lexibank), the grapheme j represents the /h/ phoneme, which is as nasalyzed as /m, n/. I'm being faithful to this description since it's the source we are using. We may want to just keep it without the nasalisation diacritic, seems not to be so important in Carvalho (2013).

  • Probably add long vowels to Shuar

On it.

  • Nasalización en Ashuar, no hay razon por el momento para separar la nasalizacion de los vocales, mejor de ũ > u ~ (ahora) a ũ > !ũ/u
  • El problema de separar la nasalizacion está en muchas variedades, habria que chequear

Already changed for Ashuar. I'm checking the other varieties next.

  • !ái/ɨi : este caso me parece raro tb. Probablemente hemos hablado de eso ya, pero si esa es la correspondencia, no hace falta la slash annotation

The slash annotation in this case and in other familiar (AchuarShiwiar.tsv) is due to the accent. Should I delete it anyways?

  • Arabela /ue/ habria que chequear tb

Piecing together the info from Carvalho (2013), Wise (1996) and Rich (1990), /ue/ does not seem to be a diphthong so the notation we have so far seems accurate.

  • long vowels in Iquito

The ortho-profile have them since they phonemic in all of the Zaparoan languages (Carvalho 2013).

  • Candoshi /mp/: coarticulation, one segment?

On it.

  • Shuar diphtongs are missing

On it.

FredericBlum commented 3 months ago

Copy&pasting from Mattermost:

  • /j̃/ en Arabela, que no deberiamos poner como /j ~/ sino verificar que vaina es

According to Rich (1999) (the source for the data in Lexibank), the grapheme j represents the /h/ phoneme, which is as nasalyzed as /m, n/. I'm being faithful to this description since it's the source we are using. We may want to just keep it without the nasalisation diacritic, seems not to be so important in Carvalho (2013).

Even more sense not to separate the tilde from /j/! Since it is a single sound. I'd vote for !j̃/j, since then we keep the information but remove the nasalization from the consonant.

  • Probably add long vowels to Shuar

On it.

  • Nasalización en Ashuar, no hay razon por el momento para separar la nasalizacion de los vocales, mejor de ũ > u ~ (ahora) a ũ > !ũ/u
  • El problema de separar la nasalizacion está en muchas variedades, habria que chequear

Already changed for Ashuar. I'm checking the other varieties next.

  • !ái/ɨi : este caso me parece raro tb. Probablemente hemos hablado de eso ya, pero si esa es la correspondencia, no hace falta la slash annotation

The slash annotation in this case and in other familiar (AchuarShiwiar.tsv) is due to the accent. Should I delete it anyways?

The problem is the correspondeoce of /a/ to /ɨ/ in the above example. But maybe thats due to the diphtong? And you are right, accent justifies the slash.

  • Arabela /ue/ habria que chequear tb

Piecing together the info from Carvalho (2013), Wise (1996) and Rich (1990), /ue/ does not seem to be a diphthong so the notation we have so far seems accurate.

  • long vowels in Iquito

The ortho-profile have them since they phonemic in all of the Zaparoan languages (Carvalho 2013).

No, they were missing at least for some vowels and languages.

  • Candoshi /mp/: coarticulation, one segment?

On it.

  • Shuar diphtongs are missing

On it.

FredericBlum commented 3 months ago

Some Iquito examples:

etc. You could load the d_northperulex.tsv file from analysis/ into your local edictor (lingulsit.de/edev/) and check the Iquito data for other cases. They all have some accents involved, which is why they were probably skipped.

MuffinLinwist commented 3 months ago
  • Probably add long vowels to Shuar

No indication of their existance in the grammar sketch (Saad 2014) neither the source dictionary we use (Pellizzaro 2005). In Taisha et al. (2006), however, it appears. I'm adding this to the ortho-profile, since we have many cases on the dataset to be coincidence.

MuffinLinwist commented 3 months ago
  • Shuar diphtongs are missing

I fixed the problem related to the accent of diphthongs not catched by the previous version of the ortho-profile. Both of the sources consulted for this ortho-profile, however, are not explicit which other diphthongs are in the language. Saad (2014, 21-22), for example, says the following:

Shuar has a number of diphthongs, including /ai/ and /au/, such as in the example given above...

Only in the transcription of the example, we find out that /ai/ > [ei]. We do not have an exact transcription of /au/ (perhaps [ou], since the reduced diphthongs ends up being [o]? no clue), neither the other diphthongs present in the language. No mention of it neither in (Pellizzaro 2005) nor in Taisha et al. (2006). We might want to keep an eye on it.

MuffinLinwist commented 3 months ago
  • !ái/ɨi : este caso me parece raro tb. Probablemente hemos hablado de eso ya, pero si esa es la correspondencia, no hace falta la slash annotation

The slash annotation in this case and in other familiar (AchuarShiwiar.tsv) is due to the accent. Should I delete it anyways?

The problem is the correspondeoce of /a/ to /ɨ/ in the above example. But maybe thats due to the diphtong? And you are right, accent justifies the slash.

Regarding this, exactly. Fast (2008, 16) specifies that the realization of the /ai/ diphthong is /ei/ (this means [ɨi]) after consonants that are not in the first syllable of the word. In our dataset, this result in ai > ɨi and $ai > ai, since there is only one word that contains the diphthong in the first syllable (the other case is a suffix).

MuffinLinwist commented 3 months ago
  • Candoshi /mp/: coarticulation, one segment?

According to Overall (2023, 620, 622), seems to be more of NC cluster rather than one segment.

MuffinLinwist commented 3 months ago

Some Iquito examples:

  • i !í/i w aː s i
  • j aː w !ɨ́/ɨ ɨ n i
  • s u !ú/u kʷ a r a n a

etc. You could load the d_northperulex.tsv file from analysis/ into your local edictor (lingulsit.de/edev/) and check the Iquito data for other cases. They all have some accents involved, which is why they were probably skipped.

This is fixed.

MuffinLinwist commented 3 months ago

I checked the nasalization in all the languages and only found one cogid in Waorani (COGID 28) that could benefit from the separation, so I'm re-uniting all the tildes with their respective vowels on the dataset.

MuffinLinwist commented 3 months ago

@FredericBlum, with the latest push on the PR #14, all of the problems here have been addressed. I reference your comments here again and a brief comment on their resolution:

  • /j̃/ en Arabela, que no deberiamos poner como /j ~/ sino verificar que vaina es

Adapted your suggested changes. Now we have /j̃/ as [!h̃/h]

  • Probably add long vowels to Shuar

Adapted this. Now we have long vowels for Shuar in the dataset.

  • Nasalización en Ashuar, no hay razon por el momento para separar la nasalizacion de los vocales, mejor de ũ > u ~ (ahora) a ũ > !ũ/u
  • El problema de separar la nasalizacion está en muchas variedades, habria que chequear

Adapted this. Now all of the tildes in the dataset are unified to their respective vowels

  • !ái/ɨi : este caso me parece raro tb. Probablemente hemos hablado de eso ya, pero si esa es la correspondencia, no hace falta la slash annotation

This is in fact, due to the diphthong.

  • Arabela /ue/ habria que chequear tb

Didn't find enough evidence for the diphthong on any source. Was not adapted.

  • long vowels in Iquito

This was fixed and currently all the long vowels not recognized due to tildes are correctly represented as long vowels in our dataset.

  • Candoshi /mp/: coarticulation, one segment?

Didn't find enough evidence for this. Was not adapted.

  • Shuar diphtongs are missing

I fxed the problem related to the accent of diphthongs not catched by the previous version of the ortho-profile. Sources, however, are not really clear on other diphthongs that may exist in the language.

If you see everything fit, we can close this issue after merging the PR.