sanskrit-lexicon / csl-inflect

GNU Lesser General Public License v3.0
3 stars 0 forks source link

csl-inflect status #2 #3

Open funderburkjim opened 4 years ago

funderburkjim commented 4 years ago

Some of the inflection coverage limitations mentioned in #1 have been reduced. These pertain to verb conjugations.

The file calc_distrib.txt has counts of number of inflected verb forms in various categories. From the 'aggregated models' section of the file, we can see how many additional forms have been added in this round of enhancements.

Counts of previous form

Previously, the verbal forms included:

42300 spcltense-a-am  Forms for special tenses (present, imperfect, imperative, optative),
                                      the 'a' conjugation classes of roots (i.e., classes 1, 4, 6, and 10),
                                      and active or middle voices
24840 spcltense-passive Forms for the four special tenses, with passive voice.

67140 total conjugational forms

Counts of additional forms

03377 spcltense-b-am Forms for special tenses (present, imperfect, imperative, optative),
                                      the other conjugation classes of roots (i.e., classes 2, 3, 5, 7, 8, 9),
                                      and active or middle voices
10521 fut future tense, active/middle voices
10611 pft periphrastic future tense, active/middle voices
10521 con conditional tense, active/middle voices

02169 ben benedictive tense, active/middle voices
01713 prf  perfect tense, active/middle voices
00263 ppf periphrastic perfect tense, active/middle voices
01150 aor aorist tense, active/middle voices

40325 total conjugational forms

107465 Total of previous and additional conjugational forms.

The following comments summarize the methodology used for the additional forms.

funderburkjim commented 4 years ago

All of the additional work was done with substantial guidance from the text A Sanskrit Primer by Madhav M. Deshpande, 2003.

future tense

Future tense conjugation tables are computed by joining a base for the 'sya' future to endings which are the the same as those for the present tense - active or middle voice. This joining is computed in the conjugate_from_bases.py program.

The base for the 'sya' future is computed by the bases_test2.py program. This program uses a previous algorithm to get a future base and then adds the 'sya' suffix to this base, taking into account whether an 'i' needs to be inserted. The previous algorithm is part of the very complicated test2.py program, which is based on Kale (Kale's Higher Sanskrit Grammar).

funderburkjim commented 4 years ago

periphrastic future tense

Following Deshpande's suggestion (p. 296), a base for the periphrastic future is formed by

This computation is done by the bases_test2.py program. Then the conjugation is obtained by a simple addition to the base of endings appropriate for the periphrastic future.

For example:

python3 conjugate_onev2.py ,a,pft kzip md

Conjugation of _,a,pft kzip

Case S D P
3p kzeptA kzeptArO kzeptAraH
2p kzeptAsi kzeptAsTaH kzeptAsTa
1p kzeptAsmi kzeptAsvaH kzeptAsmaH
funderburkjim commented 4 years ago

conditional tense

From p.327 of Deshpande:

The conditional mood paradigms look like a combination of the '-sya' future base with the past imperfect augment 'a' and terminations.

The bases_test2.py program computes the future base (as described above) and adds the 'a' affix. The result is taken as the base for the conditional tense.

The conjugate_from_bases program then joins this base to the endings for the active or middle voice, and these endings are the same as for the imperfect active/middle endings.

For example:

python3 conjugate_onev2.py ,a,con gam md

Conjugation of _,a,con gam

Case S D P
3p agamizyat agamizyatAm agamizyan
2p agamizyaH agamizyatam agamizyata
1p agamizyam agamizyAva agamizyAma
funderburkjim commented 4 years ago

benedictive tense

Benedictive conjugations are given only for those roots and voices given by Deshpande in Lesson 38.

benedictive base

For the benedictive bases, we begin with a digitization of the benedictive 3rd singular from Deshpande's table on pages 330-335; this digitization is in file benedictive_3s.txt. From a 3rd singular form, we derive a base:

benedictive endings

Benedictive endings active voice

Case S D P
3p yAt yAstAm yAsuH
2p yAH yAstam yAsta
1p yAsam yAsva yAsma

Benedictive endings middle voice

Case S D P
3p sIzwa sIyAstAm sIran
2p sIzWAH sIyAsTAm sIDvam
1p sIya sIvahi sImahi

combining benedictive base and endings

The combination of benedictive base and endings involves no sandhi in the active voice, and at most one sandhi ('s' to 'z') in the middle voice.
Examples:

Conjugation of _,a,ben ad (base = ad)

Case S D P
3p adyAt adyAstAm adyAsuH
2p adyAH adyAstam adyAsta
1p adyAsam adyAsva adyAsma

Conjugation of _,m,ben Ikz (base = Ikzi, endings start with z)

Case S D P
3p IkzizIzwa IkzizIyAstAm IkzizIran
2p IkzizIzWAH IkzizIyAsTAm IkzizIDvam
1p IkzizIya IkzizIvahi IkzizImahi

Conjugation of _,m,ben kzip (base = kzip, endings start with s)

Case S D P
3p kzipsIzwa kzipsIyAstAm kzipsIran
2p kzipsIzWAH kzipsIyAsTAm kzipsIDvam
1p kzipsIya kzipsIvahi kzipsImahi
funderburkjim commented 4 years ago

Perfect tense

Although test2.py has logic for computing perfect tense conjugations, that logic is extremely complicated, and difficult to 'tweak'. Thus, rather than using test2.py directly, we devise another simpler, though less algorithmic, method.

perfect_3p.txt

The file perfect_3p.txt. is a digitization of the perfect 3rd person perfect forms (in singular, dual and plural, for selected active and middle voices) from Deshpande's table on pages 305-310; this digitization is in file

This file is used to check the 3rd person values of our derived perfect tense conjugations, Also, we currently only compute perfect conjugations for the roots and voices appearing in Deshpande's table. Note that this provides no independent confirmation of our derivations of 1st person and 2nd person perfect forms.

Strategy for derivation

According to my reading of Kale, pages 306-7, a perfect conjugation table can be derived for a given root and voice (active/middle) from a table of endings and from four pieces of information derived from the root:

funderburkjim commented 4 years ago

perfect tense implementation

initialization of models

We start with the roots and voices from Deshpande's table on pages 305-310, in the file verb_cp_deshpande_305.txt. From this in constructed models/calc_models_prf.txt (see models/redo.sh). Essentially, this models file contain the roots and voices from Deshpande's table.

initialization of bases

The perfect_bases_test2.py program is used once to initialize the 4-part base for the perfect models. It does this by referencing several parts of the test2.py program. The result is the bases/perfect_bases.txt file. This file was subsequently modified manually, as described below.

perfect tense endings

These are take from Deshpande p. 303, or Kale p. 306-7.

Perfect Active terminations (bold = strong)

Person S D P
3p a atuH uH
2p iTa aTuH a
1p a va ma

perfect Middle terminations

Person S D P
3p e Ate ire
2p se ATe Dve
1p e vahe mahe
funderburkjim commented 4 years ago

Perfect tense combination of base and endings

As with other parts of the derivation of perfect tense conjugations, the combination of base with endings is itself intricate. In our programs:

testing the conjugation table

After completing the conjugation table, conjugate_from_bases compares the 3rd person forms to those in the tables/perfect_3p.txt file (digitization of Deshpande's table of perfect forms). Any differences are printed.

iteration

A process of iteration was used to resolve discrepancies between the 3rd person forms and those of Deshpande. This involved a few changes to bases/perfect_bases.txt as well as refinement of the perfect_join program. Currently, there are no discrepancies between the 3rd person forms and those of Deshpande.

funderburkjim commented 4 years ago

Periphrastic perfect tense

Although it was not mentioned in the above discussion of the perfect tense, not all roots take the reduplicative perfect tense. If a root does not take the reduplicative perfect tense, then it will take the periphrastic perfect tense. A few roots will take both the reduplicative and periphrastic perfect.

Currently, we restrict the periphrastic perfect to roots mentioned in Deshpande's perfect tense tables on pages 305-310.

The bases are taken from the file bases/ppfactn.txt. This file was initialized programmatically:

python3 ppf_bases_test2.py ../models/calc_models_ppf.txt temp_ppfactn.txt

ppfactn.txt was then modified slightly to be in accordance with Deshpande.

Periphrastic perfect conjugation tables can be constructed for a given root and voice (active/middle) by prefixing the base to the reduplicative perfect conjugation table of the root kf in the corresponding voice.

For example, the base for the root Ikz is IkzAm. The middle voice periphrastic perfect conjugation of Ikz joins the base to the middle voice perfect conjugation of kf: Conjugation of _,m,prf kf

Person S D P
3p cakre cakrAte cakrire
2p cakfse cakrATe cakfDve
1p cakre cakfvahe cakfmahe

The resulting conjugation for Ikz is then: Conjugation of _,m,ppf Ikz

Person S D P
3p IkzAYcakre IkzAYcakrAte IkzAYcakrire
2p IkzAYcakfze IkzAYcakrATe IkzAYcakfQve
1p IkzAYcakre IkzAYcakfvahe IkzAYcakfmahe

Note the final 'm' of the base IkzAm has a sandhi change to palatal nasal Y (slp1 spelling) before the palatal c of cakre.

It is also the case that the perfect conjugations of as (to be) or BU (to become) may be used instead of the perfect conjugations of kf.

Currrently, we only use the perfect conjugations of kf.

funderburkjim commented 4 years ago

aorist tense

The previous coding of conjugation algorithms (pysanskritv1/test2.py) includes an attempt to transcribe the material in Kale on aorist forms. However, this previous work is inadequate. Rather than attempt to upgrade it, I have chosen simply to manually digitize the forms provided by Deshpande in Lesson 37.

These Deshpande aorist forms are in two files:

funderburkjim commented 4 years ago

spcltense-b-am

These are the special tenses (pre, ipf, ipv, opt) in active and middle voices for roots in conjugational classes 2,3,5,7,8,9.

The derivations of conjugations for these cases are more complex than the corresponding derivations for roots of classes 1,4,6 and 10. Deshpande (p. 203) summarises the differences:

The conjugations 2. 3, 5, 7, 8 and 9 are different from the conjugations 1, 4, 6, and 10, in that that the verbal base in the latter conjugations ends in -a, while the verbal base in the first group of conjugations does not end in -a. This fact leads to a greater sandhi impact of the final affixes on vowels and consonants of the verbal base in these conjugations. In order to appreciate this impact, the final affixes may be divided between those with strong bases and weak bases.

The approach taken currently is similar for each of the 6 conjugation classes:

After working through the comparisons with Deshpande, I feel confidence in the derivations from prior work. One viable avenue for extending the conjugation tables to other roots not in Deshpande would be to use conjugate_one_v1.sh for other sets of models.

funderburkjim commented 4 years ago

This concludes my initial documentary comments on the extension to the verbal forms provided by csl-inflect repository.

gasyoun commented 4 years ago

Detailed, as usual. Only now I manage to read some of the older documentation. Without that the code would be dead after a while.