Paninian Generator - Githubissues

kmadathil commented 3 years ago

FYI - I have begun coding a Paninian generator. The goal is to implement the ashtadhyayi plus vartikas as needed. As of now, a basic skeleton that handles some pada-sandhi rules has been committed. Over time, I hope to add more rules, and move the process backward, eventually going through the following steps.

Semantic tag input
Prakriti + Pratyaya selection
Prakriti + Pratyaya transformations
Anga Transformation
Samhita - intra pada
Samhita - inter pada

Take a look at the generator branch - the sandhi.yaml file encodes the sutras I have so far, and process_yaml.py turns them into executable code. prakriya.py is the skeleton execution engine.

Run cd sanskrit_parser/generator ; python test.py to try it out.

avinashvarna commented 3 years ago

I think @drdhaval2785 has implemented similar generators. See https://github.com/drdhaval2785/SanskritVerb which I believe now has the older Subanta generation repo merged in. It includes a sandhi generator as well. Should we look at leveraging it before reimplementing?

drdhaval2785 commented 3 years ago

Would be happy to help.

kmadathil commented 3 years ago

Sure, we should. @drdhaval2785 - I had looked at this, and I remember we'd discussed this briefly as well. Is this completely in PHP, or is there a python version available? I remember you mentioning that this is a linear application of sutras based on the SK order - do I recollect it right? What would be the best way to leverage this?

drdhaval2785 commented 3 years ago

This is purely in PHP. No python version available. I do not have the time for converting it to Python. I will go through your code and let you know what bottlenecks I went through, so that you can make your designing decisions better. I regretted about some of my choices, but it was too late.

kmadathil commented 3 years ago

Thank you very much. It would be great if you could point to parts of your php that you think are best to reuse (I'm sure there are a lot). We can take up the conversion. The architecture I've tried to pick is classic Paninian, rather than SK based - so not a linear run of sutras.

On Mon, Oct 5, 2020 at 6:01 PM Dr. Dhaval Patel notifications@github.com wrote:

This is purely in PHP. No python version available. I do not have the time for converting it to Python. I will go through your code and let you know what bottlenecks I went through, so that you can make your designing decisions better. I regretted about some of my choices, but it was too late.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kmadathil/sanskrit_parser/issues/144#issuecomment-703968709, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACKEWNQZFCTBFDJLZ4PVQXDSJJT6XANCNFSM4SDFV4ZQ .

kmadathil commented 3 years ago

Current status

YAML format for Sutras defined and parser implemented. This allows Sutras to be coded easily. This is way better than coding directly in Python, but I'm not 100% happy with the format yet
Implemented ~300 sutras.
Paninian Prakriya Engine implemented (with some current limitations, such as nitya/anitya tests)
Can generate prakriya for ajanta pum/strI/napum prAtipadikas.
Basic test suite added, with manual and pytest versions
- pytest suite takes too much memory while the manual version (same underlying code) takes very little.

Eventually, this will allow us to replace the INRIA/Sanskrit_data databases with our own pada generator. Also, it will allow us to solve the overgeneration problem in the sandhi splitter by validating output splits with this generator.

kmadathil commented 3 years ago

$ time python ../../scripts/sanskrit_generator -t rAma -p jas --verbose
unable to import 'smart_open.gcs', disabling that module
INFO     Inputs [rAma, as]
INFO     rAma ['prAtipadika', 'pum']
INFO     as ['pratyaya', 'svAdi', 'sup', 'jas', 'suw', 'bahuvacana', 'praTamA', 'viBakti']
INFO     End Inputs

Prakriya
Input ['rAma', 'as']
Root
Prakriya Node
0 Prakriya Start ['rAma', 'as'] 0-> ['rAma', 'as']
End
Child
Prakriya Node
1 1.1.43 : suqanapuMsakasya  ['rAma', 'as'] 0-> ['rAma', 'as']
Sutras that were tiggered but did not win
1.4.17 : svAdizvasarvanAmasTAne 
1.4.18 : yaci Bam 
1.4.13 : yasmAt pratyayaviDistadAdi pratyaye'Ngam 
End
Child
Prakriya Node
2 1.4.13 : yasmAt pratyayaviDistadAdi pratyaye'Ngam  ['rAma', 'as'] 0-> ['rAma', 'as']
End
Child
Prakriya Node
3 7.3.109: jasi ca  ['rAma', 'as'] 0-> ['rAma', 'as']
Sutras that were tiggered but did not win
6.1.97 : ato guRe 
6.1.102: praTamayoH pUrvasavarRaH 
6.1.101: akaH savarRe dIrGaH 
End
Child
Prakriya Node
4 6.1.102: praTamayoH pUrvasavarRaH  ['rAma', 'as'] 0-> ['rAma', 'as']
Sutras that were tiggered but did not win
6.1.97 : ato guRe 
6.1.101: akaH savarRe dIrGaH 
End
Child
Prakriya Node
5 6.1.101: akaH savarRe dIrGaH  ['rAma', 'as'] 0-> ['rAmA', 's']
End
Child
Prakriya Node
6 6.1.105.1: dIrGAjjasi ca  ['rAmA', 's'] 0-> ['rAmA', 's']
End
Leaf Node
Final Output [['rAmA', 's']] = ['rAmAs']

Output: ['rAmAs']

real    0m10.504s
user    0m10.268s
sys     0m0.232s

gasyoun commented 3 years ago

replace the INRIA/Sanskrit_data databases with our own pada generator

Have you seen P. Scharf's code? Based on it such picture can be generated:

KVfpnPuQMCc

kmadathil / sanskrit_parser

Paninian Generator #144