hrishikeshrt / PyCDSL

Python Interface to Cologne Digital Sanskrit Lexicon (CDSL)
https://pypi.org/project/PyCDSL/
Other
12 stars 1 forks source link

Use JSON Data from sanksrit-lexicon repository #8

Closed hrishikeshrt closed 2 years ago

hrishikeshrt commented 2 years ago

If you want data to be kept updated without much hassle, you can use data available at https://github.com/sanskrit-lexicon/csl-json/tree/main/ashtadhyayi.com. It is in JSON format. www.ashtadhyayi.com uses it for frontend. This would have additional facility to the user to see the scanned page and also view the scanned page from dictionary entry itself.

The structure is simple. It gives an idea for every headword and from that ID you can search the dictionary entry.

Originally posted by @drdhaval2785 in https://github.com/hrishikeshrt/PyCDSL/issues/1#issuecomment-1027926161

hrishikeshrt commented 2 years ago

I am wondering if there's any specific benefit to this over using the SQLite databases available from the website. Isn't the data from sanskrit-lexicon github page in parallel with the one from Cologne website?

drdhaval2785 commented 2 years ago

In the hindsight, it seems that there is no much difference if you use JSON or sqlite. In a way SQLITE is more frequently updated than JSON.

Ideally csl-orig repository is the latest bleeding-edge data. Once it is stable, sqlites are generated when that data is integrated in Cologne web display.

So for stable data usage, using sqlite makes sense. I drop the idea of JSON usage.

drdhaval2785 commented 2 years ago

Please look at csl-orig at https://github.com/sanskrit-lexicon/csl-orig/commits/master . It has two commits after 24 Jan 2021, which are yet to make it to Cologne website.

One more question - It may happen that a specific dictionary (let's say MW) is not changed in the version 2.0.725 to 2.0.726 in website of Cologne. Do you download new MW sqlite when you see the version change? Or do you do some further analysis whether there is any change in MW sqlite before downloading?

hrishikeshrt commented 2 years ago

It does not use the global version to update the data. It uses the "Last modified" text at the bottom of the download page. e.g. https://www.sanskrit-lexicon.uni-koeln.de/scans/WILScan/2020/web/webtc/download.html

image

I think that data is independent for each dictionary. Further, the update check is triggered by passing the flag update=True to the setup functions. (In REPL, it can be triggered by simply typing update, which checks the last updated date and decides whether to download the update or not.)

I think I will close this as we seem to be in agreement that SQLite data is the way to go.