tylergneill / skrutable

Toolkit for manipulating Sanskrit text with Python
Other
14 stars 3 forks source link

Cumulative Data #20

Open gasyoun opened 1 year ago

gasyoun commented 1 year ago

As it is the cumulative data would be hard to be reused in scientific work for quoting. It's intermixed and that format itself would require rewriting. Let us think together about it. As of now we have (based on a part of the data on Ram_tallies.tsv):

anuṣṭubh (1,2: pathyā, 3,4: pathyā) 13993
anuṣṭubh (1,2: na-vipulā, 3,4: pathyā)  752
anuṣṭubh (1,2: ma-vipulā, 3,4: pathyā)  588
anuṣṭubh (1,2: pathyā, 3,4: na-vipulā)  586
anuṣṭubh (ardham eva: pathyā)   583
anuṣṭubh (1,2: bha-vipulā, 3,4: pathyā) 503
anuṣṭubh (1,2: pathyā, 3,4: ma-vipulā)  485
upajāti : upendravajrā [11: jtjgg], indravajrā [11: ttjgg]  344
anuṣṭubh (1,2: pathyā, 3,4: bha-vipulā) 334
vaṃśastha [12: jtjr]    268
        anuṣṭubh (1,2: pathyā, 3,4: asamīcīna)  126
        anuṣṭubh (1,2: asamīcīna, 3,4: pathyā)  109
upendravajrā [11: jtjgg]    80
anuṣṭubh (1,2: na-vipulā, 3,4: na-vipulā)   53
        vaṃśastha [12: jtjr] (? 3 eva pādāḥ yuktāḥ) 38
anuṣṭubh (1,2: ma-vipulā, 3,4: ma-vipulā)   37
anuṣṭubh (1,2: ra-vipulā, 3,4: pathyā)  37

I would love to have a way, to look at data 1) inside only anuṣṭubh (without upajāti or vaṃśastha intermixed) 2) pathyā vs. vipulā (total stats, not split to varieties) 3) in the case of vipulā do we really have to know 3,4 or we can zoom into 3 or 4 at once? Or we can have them separated? 4) an easy way to have all the asamīcīna cases and reason why they are treated as such? Mistakes to be fixed at GRETIL could be located here and we should submit them

tylergneill commented 1 year ago

the cumulative data would be hard to be reused in scientific work for quoting.

I don't intend for people to quote this data.

  1. inside only anuṣṭubh (without upajāti or vaṃśastha intermixed)
  2. pathyā vs. vipulā (total stats, not split to varieties)

Similar to issues #4 and #5. I think you mean you want a graphical interface to pick and choose these? I think it's not so hard to sort a plain-text result file, so I'm not really interested in this yet.

  1. in the case of vipulā do we really have to know 3,4 or we can zoom into 3 or 4 at once? Or we can have them separated?

Maybe this could be clearer, but "3,4" means pādas 3 and 4 together as a unit. I don't understand your question.

  1. an easy way to have all the asamīcīna cases and reason why they are treated as such?

Related to #6. Yes, this "asamīcīna" needs work. They're all imperfect ślokas, sometimes because Piṅgala's rules are violated, sometimes because vipulā conditions are violated, and these cases should be disambiguated.

Mistakes to be fixed at GRETIL could be located here and we should submit them

Isn't this purpose already served? The system provides a signal about a problem, and an interested party can do a simple sort to find the problem verses, think though them, and then communicate new ideas to the repositories. Do the work and submit them if you wish. If I have the time and inclination, I'll do the same.