kmadathil / sanskrit_parser

Parsers for Sanskrit / संस्कृतम्
MIT License
68 stars 21 forks source link

Paninian Generator #144

Open kmadathil opened 3 years ago

kmadathil commented 3 years ago

FYI - I have begun coding a Paninian generator. The goal is to implement the ashtadhyayi plus vartikas as needed. As of now, a basic skeleton that handles some pada-sandhi rules has been committed. Over time, I hope to add more rules, and move the process backward, eventually going through the following steps.

  1. Semantic tag input
  2. Prakriti + Pratyaya selection
  3. Prakriti + Pratyaya transformations
  4. Anga Transformation
  5. Samhita - intra pada
  6. Samhita - inter pada

Take a look at the generator branch - the sandhi.yaml file encodes the sutras I have so far, and process_yaml.py turns them into executable code. prakriya.py is the skeleton execution engine.

Run cd sanskrit_parser/generator ; python test.py to try it out.

avinashvarna commented 3 years ago

I think @drdhaval2785 has implemented similar generators. See https://github.com/drdhaval2785/SanskritVerb which I believe now has the older Subanta generation repo merged in. It includes a sandhi generator as well. Should we look at leveraging it before reimplementing?

drdhaval2785 commented 3 years ago

Would be happy to help.

kmadathil commented 3 years ago

Sure, we should. @drdhaval2785 - I had looked at this, and I remember we'd discussed this briefly as well. Is this completely in PHP, or is there a python version available? I remember you mentioning that this is a linear application of sutras based on the SK order - do I recollect it right? What would be the best way to leverage this?

drdhaval2785 commented 3 years ago

This is purely in PHP. No python version available. I do not have the time for converting it to Python. I will go through your code and let you know what bottlenecks I went through, so that you can make your designing decisions better. I regretted about some of my choices, but it was too late.

kmadathil commented 3 years ago

Thank you very much. It would be great if you could point to parts of your php that you think are best to reuse (I'm sure there are a lot). We can take up the conversion. The architecture I've tried to pick is classic Paninian, rather than SK based - so not a linear run of sutras.

On Mon, Oct 5, 2020 at 6:01 PM Dr. Dhaval Patel notifications@github.com wrote:

This is purely in PHP. No python version available. I do not have the time for converting it to Python. I will go through your code and let you know what bottlenecks I went through, so that you can make your designing decisions better. I regretted about some of my choices, but it was too late.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kmadathil/sanskrit_parser/issues/144#issuecomment-703968709, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACKEWNQZFCTBFDJLZ4PVQXDSJJT6XANCNFSM4SDFV4ZQ .

kmadathil commented 3 years ago

Current status

Eventually, this will allow us to replace the INRIA/Sanskrit_data databases with our own pada generator. Also, it will allow us to solve the overgeneration problem in the sandhi splitter by validating output splits with this generator.

kmadathil commented 3 years ago
$ time python ../../scripts/sanskrit_generator -t rAma -p jas --verbose
unable to import 'smart_open.gcs', disabling that module
INFO     Inputs [rAma, as]
INFO     rAma ['prAtipadika', 'pum']
INFO     as ['pratyaya', 'svAdi', 'sup', 'jas', 'suw', 'bahuvacana', 'praTamA', 'viBakti']
INFO     End Inputs

Prakriya
Input ['rAma', 'as']
Root
Prakriya Node
0 Prakriya Start ['rAma', 'as'] 0-> ['rAma', 'as']
End
Child
Prakriya Node
1 1.1.43 : suqanapuMsakasya  ['rAma', 'as'] 0-> ['rAma', 'as']
Sutras that were tiggered but did not win
1.4.17 : svAdizvasarvanAmasTAne 
1.4.18 : yaci Bam 
1.4.13 : yasmAt pratyayaviDistadAdi pratyaye'Ngam 
End
Child
Prakriya Node
2 1.4.13 : yasmAt pratyayaviDistadAdi pratyaye'Ngam  ['rAma', 'as'] 0-> ['rAma', 'as']
End
Child
Prakriya Node
3 7.3.109: jasi ca  ['rAma', 'as'] 0-> ['rAma', 'as']
Sutras that were tiggered but did not win
6.1.97 : ato guRe 
6.1.102: praTamayoH pUrvasavarRaH 
6.1.101: akaH savarRe dIrGaH 
End
Child
Prakriya Node
4 6.1.102: praTamayoH pUrvasavarRaH  ['rAma', 'as'] 0-> ['rAma', 'as']
Sutras that were tiggered but did not win
6.1.97 : ato guRe 
6.1.101: akaH savarRe dIrGaH 
End
Child
Prakriya Node
5 6.1.101: akaH savarRe dIrGaH  ['rAma', 'as'] 0-> ['rAmA', 's']
End
Child
Prakriya Node
6 6.1.105.1: dIrGAjjasi ca  ['rAmA', 's'] 0-> ['rAmA', 's']
End
Leaf Node
Final Output [['rAmA', 's']] = ['rAmAs']

Output: ['rAmAs']

real    0m10.504s
user    0m10.268s
sys     0m0.232s
gasyoun commented 3 years ago

replace the INRIA/Sanskrit_data databases with our own pada generator

Have you seen P. Scharf's code? Based on it such picture can be generated:

KVfpnPuQMCc