indic-dict / stardict-sanskrit

Stardict dictionary files for the Sanskrit language.
https://sanskrit-coders.github.io/dictionaries/offline/
76 stars 16 forks source link

ACC added #31

Closed drdhaval2785 closed 7 years ago

drdhaval2785 commented 7 years ago

Aufrecht's Catalogus Catalogorum added in sa-head folder. Works well on my colordict on android.

vvasuki commented 7 years ago

beautiful. mind sending an announcement to various mailing lists?

drdhaval2785 commented 7 years ago

I will wait for some time before doing so. Currently the data I am using was fetched from cologne server some time ago. So it may be slightly outdated. A year or so.

Currently I have requested for a utility to fetch latest XML files from cologne server at https://github.com/sanskrit-lexicon/Cologne/issues/106.

Once that is through, I intend to regenerate all Cologne stardict files lets say once a month.

So let us have latest corrected data. Then we will blow the trumpet. In the meanwhile I am trying to add as many dictionaries as possible. Obviously, there will be bugs or improvements. They can be handled in the meanwhile. When we are ready on both fronts, we will blow the trumpet properly.

gasyoun commented 7 years ago

We can't edit Stardict so we can enter text in different modes as in https://github.com/shreevatsa/sanskrit/blob/master/transliteration/detect.py, right?

vvasuki commented 7 years ago

We can't edit Stardict so we can enter text in different modes as in https://github.com/shreevatsa/sanskrit/blob/master/transliteration/detect.py, right?

I am not sure about that yet - will follow up on https://github.com/sanskrit-coders/stardict-sanskrit/issues/24

vvasuki commented 7 years ago

This too is a good candidate for separation of subentries, as can be seen from the below:

arj-una (from a lost vb. akin to rāj). I. adj., f. nī, White, Chr. 288, 3 = Rigv. i. 49, 3. II. m.

  1. A tree, Terminalia Arjuna, Rām. 3, 19, 13.
  2. The name of the third son of Pāṇḍu, Indr. 1, 10. III. f. nī, The dawn, Rām, 2, 114, 14. -Cf. Lat. argentum; the base of these forms is arj + vant: cf. also see rańj, rajata.
drdhaval2785 commented 7 years ago

The place to do so is before generating XMLs. But the behaviour is too abberrant. Will be a slow process.

vvasuki commented 7 years ago

The place to do so is before generating XMLs.

That sounds right.

But the behaviour is too abberrant. Will be a slow process.

ok.. Just need to look for roman numerals followed by a dot at the first stage.

drdhaval2785 commented 7 years ago

screenshot from 2017-04-16 16-28-19

The entry needs to be separated.

drdhaval2785 commented 7 years ago

https://github.com/sanskrit-coders/stardict-sanskrit/issues/31#issuecomment-291086153

In response to this, the following script helps fetch XML for all dictionaries from Cologne. https://github.com/sanskrit-lexicon/cologne-stardict/blob/master/updatexml.sh

vvasuki commented 7 years ago

curl -o input/zips/"$DICT"_xml.zip http://s3.amazonaws.com/sanskrit-lexicon/blobs/"$DICT"_xml.zip

Curious! You're using aws? How and why?

drdhaval2785 commented 7 years ago

Not me. Jim is operating this AWS for Cologne downloads.

gasyoun commented 7 years ago

why?

For backup purposes. So if Jim has to pass all administration tasks to Dhaval, he knows where to find them.

funderburkjim commented 7 years ago

For backup purposes

Right - The Cologne data is Creative commons licensed. So it is ok for the data to be made independent of Cologne hosting and support. AWS S3 is a good choice because the backup process can be scripted, and the Cologne and AWS bandwidth is so good that it is practical to keep S3 backups current with Cologne.