indic-dict / stardict-sanskrit

Stardict dictionary files for the Sanskrit language.
https://sanskrit-coders.github.io/dictionaries/offline/
76 stars 16 forks source link

Creating open Sanskrit translation database #7

Closed damooo closed 8 years ago

damooo commented 8 years ago

It is must for all of us. As so much descriptive dictionaries are there like apte, mw, etc.. for us. but for machines we have to create index of pin point usage translations. then only from that we can use it for so many amazing things, like translation, and it will be a huge organised sample for creating many tools like sandhi,,samasa,and speech synthesis, and can create offline translation,referances infinitely many possiblities ,

devs at spokensanskrit , are doing great service for all of us, created a very informative treasure, along with community. we all are so much thankful to them for doing this greaat work.. but ,, for some reasons it is online only, and that too online server sided processing, and dynamic web pages. and they are unable to share their great work for a reason. but however,, it is a great service.

so now it will be good if we start an open source project for it. and for words there will be single point blank translation only, who ever translated.. it will be that one word only.like for 'milati' , 'assemble' is only translation. so we can refer and copy from any where with out restrictions.

things to do:

  1. extract such pin point usage only translations from all well known dictionaries, and add them. so it needs manual work, as they are filled with descriptions for us humans.
  2. from wikis
  3. copy from others
  4. manual addition
  5. if possible create a site, and add provision to users to submit new words..

though it is difficult, and time consuming, we have to start, at some time, for creating all these possibilities.

Jai shriram /\

damooo commented 8 years ago

as a start, i will push a major commit, with in two days. : ) i am working hard on it from last two days.

damooo commented 8 years ago

Namaste

Finally i am able to retrieve major part of data from spokensanskrit.. And created a table file with all that database. it may help all major projects greatly. And all thanks, blessings, appreciations, credits goes only to them, who created this great website with great community. i just retrieved that.

vvasuki commented 8 years ago

Good work on getting spokensanskrit data! I don't yet understand what you set out to do.

Please tell me: what is "pin point" translation? Is it merely entries like: हयग्रीव hayagrIva m. of a demon पलाद palAda m. demon

That doesn't seem specially useful beyond what's already possible with various babylon / stardict dictionaries.

damooo commented 8 years ago

hm, that namavachakas may not be useful.. but when we have words like │अकर्तृ│akartR│m.│not active │अकुल│akula│adj.│low │धावन│dhAvana│n.│running [ sport ] │धुर्य│dhurya│adj.│eminently fit for or distinguished by │विमथति { विमथ् }│vimathati { vimath }│verb│tear or break in pieces

or what ever like that, there given only one single usage translation with out any descriptions or any other details. So we can use this database of usage words , in translation projects or what ever. i thought like that. with a sql database, we can directly access one , and form senteses for translation. isn't it possible??

vvasuki commented 8 years ago

But, I am even skeptical about the examples you quote above. I'll show: अकर्तृ│akartR│m.│not active This is from the monier williams entry: akartR अकर्तृ/ अ-कर्तृ m. not an agent , N. applied to the पुरुष(in सांख्यphil. ) akartR अकर्तृ/ अ-कर्तृ m. not active (in Gr. )

We observe that the full entry is more useful, whereas the former is more confusing than useful. Similarly: mw mfn (w.r. धूर्य) fit o be harnessed, able to draw or bear being at the head of, foremost, best etc eminently fit for or distinguished by (comp.) m beast of burden, horse, bullock etc minister, charge d'affaires (with मन्त्रिन् ) leader, chief (cf. कुल) etc a kind of medic. plant (= ऋषभ) n forepart of a pole N. of all स्तोत्रs except the 3 पवमानs

is the full entry. That is more useful than the tiny confusing snippet above.

For example, @shruthivis regularly gets confused by such spokensanskrit (mis)definitions. Do we really want to spread that confusion?

vvasuki commented 8 years ago

As to translation - that's indeed a desirable (and challenging) goal - I think that it is best to rely directly on good, authoritative sources. So, if in the process of translation you want to get all the n=5 definitions of say धुर्य and rank them by importance, we should just format MW and other dictionaries so that these separate entries are clearly differentiated.

damooo commented 8 years ago

yup. forr those words, in this file, more synonomical entries are there. i didn't posted them. And about you say these dictionaries, this is also part of this project, as i mentioned in description of issue. i created folders also. work is going on with major dictionaries. : ) , and this spoken sanskrit database pushed as a start only : )

vvasuki commented 8 years ago

But then, I would ask (in the spirit of avoiding misspent effort) - if you want to fix (say) kRdantarUpamAla to separate out various meanings of the same word, you should focus efforts on improving kalpadruma-sa/kalpadruma-sa.babylon , rather than create a new file here. Why create this separate database?

damooo commented 8 years ago

all description will have description and examples, referances, which are very useful for us. but translating machines doesn't need them, like descriptions, citations, examples, referrances. indeed i thought they may cause clumsyness, in translation. that's why i thought to make one copy with removing those descriptions, examples. i think they all are valuble data for humans, and shouldn't remove there itself. ; (

vvasuki commented 8 years ago

Oh I now see what you're up to. When scanning a sanskrit text, you want to automatically replace "वृक्षस्य" with "of a tree". Ok - creating a database for that purpose sounds good. Go ahead!

What is the morpheus project you referred to in your email again?

damooo commented 8 years ago

ah,thanks. that's what my so called 'pin point'. i don't know how to express that in this stupid english : ( and morpheus project, i saw discussion in sanskrit programmers google group, about a chrome extension

vvasuki commented 8 years ago

Oh Michael's chrome extension! Ok - but note that for such a project, the full entry meant for humans if far more useful than these "pin point" translations. The pin-point translations are more useful only if you want to write a program to create a fully automated translation.

damooo commented 8 years ago

yes yes : ), i got it. this is for helping machine traslation only. : )

damooo commented 8 years ago

Is it possible for other mini repo, as it is not at all related to stardict, and has considerable amount of sub projects inside itself.. and has special purpose. and closing this issue, and opening new one to deal with development issues, and progress. as this issue got filled with our talk due to my immature english : )