csl-inflect status #2 - Githubissues

funderburkjim commented 4 years ago

Some of the inflection coverage limitations mentioned in #1 have been reduced. These pertain to verb conjugations.

The file calc_distrib.txt has counts of number of inflected verb forms in various categories. From the 'aggregated models' section of the file, we can see how many additional forms have been added in this round of enhancements.

Counts of previous form

Previously, the verbal forms included:

42300 spcltense-a-am  Forms for special tenses (present, imperfect, imperative, optative),
                                      the 'a' conjugation classes of roots (i.e., classes 1, 4, 6, and 10),
                                      and active or middle voices
24840 spcltense-passive Forms for the four special tenses, with passive voice.

67140 total conjugational forms

Counts of additional forms

03377 spcltense-b-am Forms for special tenses (present, imperfect, imperative, optative),
                                      the other conjugation classes of roots (i.e., classes 2, 3, 5, 7, 8, 9),
                                      and active or middle voices
10521 fut future tense, active/middle voices
10611 pft periphrastic future tense, active/middle voices
10521 con conditional tense, active/middle voices

02169 ben benedictive tense, active/middle voices
01713 prf  perfect tense, active/middle voices
00263 ppf periphrastic perfect tense, active/middle voices
01150 aor aorist tense, active/middle voices

40325 total conjugational forms

107465 Total of previous and additional conjugational forms.

The following comments summarize the methodology used for the additional forms.

funderburkjim commented 4 years ago

All of the additional work was done with substantial guidance from the text A Sanskrit Primer by Madhav M. Deshpande, 2003.

future tense

Future tense conjugation tables are computed by joining a base for the 'sya' future to endings which are the the same as those for the present tense - active or middle voice. This joining is computed in the conjugate_from_bases.py program.

The base for the 'sya' future is computed by the bases_test2.py program. This program uses a previous algorithm to get a future base and then adds the 'sya' suffix to this base, taking into account whether an 'i' needs to be inserted. The previous algorithm is part of the very complicated test2.py program, which is based on Kale (Kale's Higher Sanskrit Grammar).

funderburkjim commented 4 years ago

periphrastic future tense

Following Deshpande's suggestion (p. 296), a base for the periphrastic future is formed by

computing the infinitive of the root, using a part of the test2.py program, and
then dropping the ending um of that infinitive.

This computation is done by the bases_test2.py program. Then the conjugation is obtained by a simple addition to the base of endings appropriate for the periphrastic future.

For example:

python3 conjugate_onev2.py ,a,pft kzip md

Conjugation of _,a,pft kzip

Case	S	D	P
3p	kzeptA	kzeptArO	kzeptAraH
2p	kzeptAsi	kzeptAsTaH	kzeptAsTa
1p	kzeptAsmi	kzeptAsvaH	kzeptAsmaH

funderburkjim commented 4 years ago

conditional tense

From p.327 of Deshpande:

The conditional mood paradigms look like a combination of the '-sya' future base with the past imperfect augment 'a' and terminations.

The bases_test2.py program computes the future base (as described above) and adds the 'a' affix. The result is taken as the base for the conditional tense.

The conjugate_from_bases program then joins this base to the endings for the active or middle voice, and these endings are the same as for the imperfect active/middle endings.

For example:

python3 conjugate_onev2.py ,a,con gam md

Conjugation of _,a,con gam

Case	S	D	P
3p	agamizyat	agamizyatAm	agamizyan
2p	agamizyaH	agamizyatam	agamizyata
1p	agamizyam	agamizyAva	agamizyAma

funderburkjim commented 4 years ago

benedictive tense

Benedictive conjugations are given only for those roots and voices given by Deshpande in Lesson 38.

benedictive base

For the benedictive bases, we begin with a digitization of the benedictive 3rd singular from Deshpande's table on pages 330-335; this digitization is in file benedictive_3s.txt. From a 3rd singular form, we derive a base:

if the 3s form is for the active voice, then that 3s form ends with 'yAt'; we drop that 'yAt' and consider the remainder to be the base
if the 3s form is for the middle voice, that that 3s form ends with either 'sizwa' or 'zizwa' (recall we are using slp1 transliteration to spell Sanskrit); we drop those final 5 characters, and consider the remainder to be the base.
- We also note whether the dropped suffix is 'sizwa' or 'zizwa', remembering 's' or 'z'. This will be needed when combining the base with the benedictive endings. For example:
the benedictive 3s for root 'ad' in active voice is 'adyAt', and 'ad' is the base.
the benedictive 3s for root 'Ikz' in middle voice is 'IkzizIzwa', and 'Ikzi' is the base.

benedictive endings

Benedictive endings active voice

Case	S	D	P
3p	yAt	yAstAm	yAsuH
2p	yAH	yAstam	yAsta
1p	yAsam	yAsva	yAsma

Benedictive endings middle voice

Case	S	D	P
3p	sIzwa	sIyAstAm	sIran
2p	sIzWAH	sIyAsTAm	sIDvam
1p	sIya	sIvahi	sImahi

combining benedictive base and endings

The combination of benedictive base and endings involves no sandhi in the active voice, and at most one sandhi ('s' to 'z') in the middle voice.
Examples:

Conjugation of _,a,ben ad (base = ad)

Case	S	D	P
3p	adyAt	adyAstAm	adyAsuH
2p	adyAH	adyAstam	adyAsta
1p	adyAsam	adyAsva	adyAsma

Conjugation of _,m,ben Ikz (base = Ikzi, endings start with z)

Case	S	D	P
3p	IkzizIzwa	IkzizIyAstAm	IkzizIran
2p	IkzizIzWAH	IkzizIyAsTAm	IkzizIDvam
1p	IkzizIya	IkzizIvahi	IkzizImahi

Conjugation of _,m,ben kzip (base = kzip, endings start with s)

Case	S	D	P
3p	kzipsIzwa	kzipsIyAstAm	kzipsIran
2p	kzipsIzWAH	kzipsIyAsTAm	kzipsIDvam
1p	kzipsIya	kzipsIvahi	kzipsImahi

funderburkjim commented 4 years ago

Perfect tense

Although test2.py has logic for computing perfect tense conjugations, that logic is extremely complicated, and difficult to 'tweak'. Thus, rather than using test2.py directly, we devise another simpler, though less algorithmic, method.

perfect_3p.txt

The file perfect_3p.txt. is a digitization of the perfect 3rd person perfect forms (in singular, dual and plural, for selected active and middle voices) from Deshpande's table on pages 305-310; this digitization is in file

This file is used to check the 3rd person values of our derived perfect tense conjugations, Also, we currently only compute perfect conjugations for the roots and voices appearing in Deshpande's table. Note that this provides no independent confirmation of our derivations of 1st person and 2nd person perfect forms.

Strategy for derivation

According to my reading of Kale, pages 306-7, a perfect conjugation table can be derived for a given root and voice (active/middle) from a table of endings and from four pieces of information derived from the root:

a reduplicated base to be used before strong endings
- the singular active voice endings
a (possibly different) reduplicated base to be used before weak endings
- dual or plural active voice endings
- singular, dual or plural middle voice endings (i.e., any middle voice ending)
a sew-code relevant for all endings EXCEPT the Ta ending of the 2nd person singular active voice. This sew-code has one of three values:
- sew which means that 'i' is inserted between the base and ending
- aniw which means that 'i' is NOT inserted between the base and ending
- vew which means that 'i' is optionally inserted between the base and ending
a (possibly different) sew-code that applies just to the Ta ending of the 2nd person singular active voice.

funderburkjim commented 4 years ago

perfect tense implementation

initialization of models

We start with the roots and voices from Deshpande's table on pages 305-310, in the file verb_cp_deshpande_305.txt. From this in constructed models/calc_models_prf.txt (see models/redo.sh). Essentially, this models file contain the roots and voices from Deshpande's table.

initialization of bases

The perfect_bases_test2.py program is used once to initialize the 4-part base for the perfect models. It does this by referencing several parts of the test2.py program. The result is the bases/perfect_bases.txt file. This file was subsequently modified manually, as described below.

perfect tense endings

These are take from Deshpande p. 303, or Kale p. 306-7.

Perfect Active terminations (bold = strong)

Person	S	D	P
3p	a	atuH	uH
2p	iTa	aTuH	a
1p	a	va	ma

perfect Middle terminations

Person	S	D	P
3p	e	Ate	ire
2p	se	ATe	Dve
1p	e	vahe	mahe

funderburkjim commented 4 years ago

Perfect tense combination of base and endings

As with other parts of the derivation of perfect tense conjugations, the combination of base with endings is itself intricate. In our programs:

tables/conjugate_from_bases.py reads a record from the bases/perfect_bases.txt file and prepares to combine the 4-part base with the appropriate voice for a given root (from the bases/perfect_bases.txt) and a given ending
then the perfect_join program actually carries out the generation of inflections by
- adding an 'i' insert between base and ending when appropriate
- performing needed sandhis.

testing the conjugation table

After completing the conjugation table, conjugate_from_bases compares the 3rd person forms to those in the tables/perfect_3p.txt file (digitization of Deshpande's table of perfect forms). Any differences are printed.

iteration

A process of iteration was used to resolve discrepancies between the 3rd person forms and those of Deshpande. This involved a few changes to bases/perfect_bases.txt as well as refinement of the perfect_join program. Currently, there are no discrepancies between the 3rd person forms and those of Deshpande.

funderburkjim commented 4 years ago

Periphrastic perfect tense

Although it was not mentioned in the above discussion of the perfect tense, not all roots take the reduplicative perfect tense. If a root does not take the reduplicative perfect tense, then it will take the periphrastic perfect tense. A few roots will take both the reduplicative and periphrastic perfect.

Currently, we restrict the periphrastic perfect to roots mentioned in Deshpande's perfect tense tables on pages 305-310.

The bases are taken from the file bases/ppfactn.txt. This file was initialized programmatically:

python3 ppf_bases_test2.py ../models/calc_models_ppf.txt temp_ppfactn.txt

ppfactn.txt was then modified slightly to be in accordance with Deshpande.

Periphrastic perfect conjugation tables can be constructed for a given root and voice (active/middle) by prefixing the base to the reduplicative perfect conjugation table of the root kf in the corresponding voice.

For example, the base for the root Ikz is IkzAm. The middle voice periphrastic perfect conjugation of Ikz joins the base to the middle voice perfect conjugation of kf: Conjugation of _,m,prf kf

Person	S	D	P
3p	cakre	cakrAte	cakrire
2p	cakfse	cakrATe	cakfDve
1p	cakre	cakfvahe	cakfmahe

The resulting conjugation for Ikz is then: Conjugation of _,m,ppf Ikz

Person	S	D	P
3p	IkzAYcakre	IkzAYcakrAte	IkzAYcakrire
2p	IkzAYcakfze	IkzAYcakrATe	IkzAYcakfQve
1p	IkzAYcakre	IkzAYcakfvahe	IkzAYcakfmahe

Note the final 'm' of the base IkzAm has a sandhi change to palatal nasal Y (slp1 spelling) before the palatal c of cakre.

It is also the case that the perfect conjugations of as (to be) or BU (to become) may be used instead of the perfect conjugations of kf.

Currrently, we only use the perfect conjugations of kf.

funderburkjim commented 4 years ago

aorist tense

The previous coding of conjugation algorithms (pysanskritv1/test2.py) includes an attempt to transcribe the material in Kale on aorist forms. However, this previous work is inadequate. Rather than attempt to upgrade it, I have chosen simply to manually digitize the forms provided by Deshpande in Lesson 37.

These Deshpande aorist forms are in two files:

tables_aorist.txt contains
- 13 full conjugation tables
- 230 partial conjugation tables, with the 3rd person forms only. Unknown forms (2nd person and 1st person) appear with value '?'
tables_aorist_passive.txt contains 193 partial conjugation tables, with only the 3rd person singular passive form, ending in 'i'. Other forms appear with '?' to represent unknown values.

funderburkjim commented 4 years ago

spcltense-b-am

These are the special tenses (pre, ipf, ipv, opt) in active and middle voices for roots in conjugational classes 2,3,5,7,8,9.

The derivations of conjugations for these cases are more complex than the corresponding derivations for roots of classes 1,4,6 and 10. Deshpande (p. 203) summarises the differences:

The conjugations 2. 3, 5, 7, 8 and 9 are different from the conjugations 1, 4, 6, and 10, in that that the verbal base in the latter conjugations ends in -a, while the verbal base in the first group of conjugations does not end in -a. This fact leads to a greater sandhi impact of the final affixes on vowels and consonants of the verbal base in these conjugations. In order to appreciate this impact, the final affixes may be divided between those with strong bases and weak bases.

The approach taken currently is similar for each of the 6 conjugation classes:

Restrict the conjugations to the roots/voices presented by Deshpande.
- There are model files for each class. e.g. models_1_2.txt contains all the class 2 roots/voices for which Deshpande presents conjugations. In addition, this file contains other class 2 roots/voices from Monier-Williams dictionary, but commented out (i.e., conjugation tables are not currently prepared for these commented out roots).
The bases are generally just the root. These bases are not actually used.
Generate conjugation table candidates for the Deshpande models. This is done programmatically, using pysanskritv1/conjugate_one_v1.py program. This program uses previous work (i.e., parts of test2.py program) to generate the conjugations. This may be done by a script. For class 2: sh conjugate_one_v1.sh ../models/models_1_2.txt > temp_1_2.txt temp_1_2.txt is taken as the initial value of the tables_1_2.txt It includes conjugations for the 4 special tense (pre, ipf, ipv, opt).
Compare the computed conjugations in tables_1_2.txt with those published by Deshpande; resolve differences. There are few differences, and usually these are resolved by choosing Deshpande's version. Also, comments are added.
After editing, the tables_1_c.txt files are used as the final result for these conjugations.

After working through the comparisons with Deshpande, I feel confidence in the derivations from prior work. One viable avenue for extending the conjugation tables to other roots not in Deshpande would be to use conjugate_one_v1.sh for other sets of models.

funderburkjim commented 4 years ago

This concludes my initial documentary comments on the extension to the verbal forms provided by csl-inflect repository.

gasyoun commented 4 years ago

Detailed, as usual. Only now I manage to read some of the older documentation. Without that the code would be dead after a while.

sanskrit-lexicon / csl-inflect

csl-inflect status #2 #3

Counts of previous form

Counts of additional forms

future tense

periphrastic future tense

python3 conjugate_onev2.py ,a,pft kzip md

conditional tense

python3 conjugate_onev2.py ,a,con gam md

benedictive tense

benedictive base

benedictive endings

combining benedictive base and endings

Perfect tense

perfect_3p.txt

Strategy for derivation

perfect tense implementation

initialization of models

initialization of bases

perfect tense endings

Perfect Active terminations (bold = strong)

perfect Middle terminations

Perfect tense combination of base and endings

testing the conjugation table

iteration

Periphrastic perfect tense

aorist tense

spcltense-b-am