Closed hrishikeshrt closed 2 years ago
I am wondering if there's any specific benefit to this over using the SQLite databases available from the website. Isn't the data from sanskrit-lexicon github page in parallel with the one from Cologne website?
In the hindsight, it seems that there is no much difference if you use JSON or sqlite. In a way SQLITE is more frequently updated than JSON.
Ideally csl-orig repository is the latest bleeding-edge data. Once it is stable, sqlites are generated when that data is integrated in Cologne web display.
So for stable data usage, using sqlite makes sense. I drop the idea of JSON usage.
Please look at csl-orig at https://github.com/sanskrit-lexicon/csl-orig/commits/master . It has two commits after 24 Jan 2021, which are yet to make it to Cologne website.
One more question - It may happen that a specific dictionary (let's say MW) is not changed in the version 2.0.725 to 2.0.726 in website of Cologne. Do you download new MW sqlite when you see the version change? Or do you do some further analysis whether there is any change in MW sqlite before downloading?
It does not use the global version to update the data. It uses the "Last modified" text at the bottom of the download page. e.g. https://www.sanskrit-lexicon.uni-koeln.de/scans/WILScan/2020/web/webtc/download.html
I think that data is independent for each dictionary.
Further, the update check is triggered by passing the flag update=True
to the setup functions. (In REPL, it can be triggered by simply typing update
, which checks the last updated date and decides whether to download the update or not.)
I think I will close this as we seem to be in agreement that SQLite data is the way to go.
If you want data to be kept updated without much hassle, you can use data available at https://github.com/sanskrit-lexicon/csl-json/tree/main/ashtadhyayi.com. It is in JSON format. www.ashtadhyayi.com uses it for frontend. This would have additional facility to the user to see the scanned page and also view the scanned page from dictionary entry itself.
The structure is simple. It gives an idea for every headword and from that ID you can search the dictionary entry.
Originally posted by @drdhaval2785 in https://github.com/hrishikeshrt/PyCDSL/issues/1#issuecomment-1027926161