indic-dict / stardict-sanskrit

Stardict dictionary files for the Sanskrit language.
https://sanskrit-coders.github.io/dictionaries/offline/
76 stars 16 forks source link

Create mahAbhArata cultural index dictionary #5

Closed vvasuki closed 8 years ago

vvasuki commented 8 years ago

We must produce a script to transform mci.txt into a babylon dictionary.

Regarding scripting - scripts in https://github.com/sanskrit-coders/stardict-sanskrit/tree/master/sa-head/kalpadruma-sa/mUlam​ may help you with a speedy start!

damooo commented 8 years ago

Are there any pdfs of this book avialable?? As source file is not giving clear picture, because of lack of clarity in some tags. if pdf is there it will some how improve final output. I searched in archives, scribd, and so many.. but cannot found it.

vvasuki commented 8 years ago

http://www.sanskrit-lexicon.uni-koeln.de/scans/MCIScan/2014/web/index.php इति दृश्यताम्

2016-04-18 8:56 GMT-07:00 श्रीराम notifications@github.com:

Are there any pdfs of this book avialable?? As source file is not giving clear picture, because of lack of clarity in some tags. if pdf is there it will some how improve final output. I searched in archives, scribd, and so many.. but cannot found it.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/sanskrit-coders/stardict-sanskrit/issues/5#issuecomment-211443800

Vishvas /विश्वासः

damooo commented 8 years ago

धन्यावाद: /\

damooo commented 8 years ago

Sorry for late. my laptop got repaired. and yesterday i got it back to work. now mahabharata_cultural_index completed. Update....

  1. All things like previous. numbered lists, varga panchamas,rutvams, extra extracted bracketted search words , ansvaras all , and their permutations included for devanagari,english telugu,,kannada head words.And discription cleaned ,formatted, removed all unwanted new lines, preserved paragraphs, wanted newlines. so it will be rendered well on all screen sizes
  2. mainly search words are not general words, but filled with so many samasaH andd sandhis. like 'arjunasya vane vaasaH' or 'gautamasya vanam' , so many are like this. which we cannot think of and search. So trying to make more general words from them. like for 'arjunasya vane vaasaH' , head words will be 'arjuna|arjunasya vane vaasaH|vane vaasaH'. so that, we can search character name, or complete word, or latter word. as there are so many aashramas, like agastyasya aashrama, etc.. if we search for 'aashrama', it will give list of all aashramas of bharata time. and can be searchable by agastya too. But it is too experimental, and in basic stage, as there are so many samasas and sandhis. so it will be gradually improved.
  3. some names are endding with long vowels, which are short ovels in daily usage . like for 'draupadii' ,'saurabhii'... we use 'draupadi' 'saurabhi' .they are one quarter of all words. so 'ii' or 'uu' or 'aa' like these endings are modified to corresponding short vowels, for convinience, and linking from others. originals also retained.
  4. As search words are not casual words, many like me, cannot guess what to search. at max by searching we can find some 4 words. :) , so dict should be browsable, and navigatable. So page numbers added in headwords. if you search page number 'p134' we will get that page in same order as of book. and at last there will be page numbers of previous, and next pages. so we can double click. thus we can use it as of book. and position of word added in description.
  5. due to reasons like above, added index pages too. so if we search for devanagari first letter or itrans of it , we will get index page of words starting with that letter. ex. if we search for 'ई ' or 'ii' , we will goto it's index. in index all words and their corresponding pages will be listed with out description, and in alphabeticle order. we can double click on word or page number to go correspondingly. for example search for 'श ' or 'sha' .
  6. Many other....
vvasuki commented 8 years ago

Dear dAmodara - wonderful progress!

I don't see any value whatsoever in including digits such as : 04) अङ्गद. In fact it is a negative - it brings up all such bad suggestions if we ever in the future make a dictionary whose indices genuinely need to start with a number (eg: sutra 04.2.23). I urge you to remove them.

Also, while this is superb work, it is laborious and hard to reproduce - do you intend to produce a script?

vvasuki commented 8 years ago

Also, can you replace page of the form p169 keywords with mahAbhArata cultural index p169? Users would not like p169 of MCI to be shown with p169 of some other book they're not interested in..

damooo commented 8 years ago

Oh. So i will replace p169 with 'p169.mci' like that. then it will solve second problem. Regarding first one, So search results may became problem, when other dicts really nead them. reason why i included them is when we search a page number 'p182' like that, the results are not in order of word order in original book. in original book, it fallows iast alphabetical order, and golden dict fallows order of language of first head word. So to fix it i added word number before duplicate of original word. thus it became browsable. So it is not intended for search purpose, rather for display. So to solve present side effect in search results, i will append an untypable unicode special char, before that number, so word order will be same, display will be good, no inconvinience in search results. like '★ 04) अङ्गद ' . and '★' or what ever char makes display also nice, and, word order will be there.

damooo commented 8 years ago

And regarding scripts, i am now learning higher human readable languages like u suggested python . So with in one month, i will try to learn and, produce a more general and interactive script, which can be used on any of these text files. please give this chance, it will be definately worth this wait,,, i promice. : )

vvasuki commented 8 years ago

Great news on the script! Eagerly await it.

"So it is not intended for search purpose, rather for display. So to solve present side effect in search results," - thanks for clarifying! Plz include that note in the README. Although I think that you are solving an unimportant problem with it - do users really care that the sort order for a given page is off? I fear that such distraction will keep you from getting to more important problems.

damooo commented 8 years ago

fixed all problems : )

Update:

  1. changed page search pattern to p146_mci , like that.but when searching result will visible with out typing tag. corrected links in description for double click.
  2. display intended headwords are prefixed like ' ➤ 03) अङ्गिरस् ' . so no intereptions if other dict contain number headwords.
  3. description formatting improved so much..
  4. IaST tansliteration aded to headwords.
  5. many fixes,improvements.
damooo commented 8 years ago

closing issue, as main improvements are done, and it is well readable, and browsable.