Data related question - Githubissues

hrishikeshrt / PyCDSL

Python Interface to Cologne Digital Sanskrit Lexicon (CDSL)

https://pypi.org/project/PyCDSL/

Other

12 stars 1 forks source link

Data related question #1

Closed drdhaval2785 closed 2 years ago

drdhaval2785 commented 2 years ago

Python Interface to Cologne Digital Sanskrit Lexicon (CDSL) version: Not tried
Python version: Not tried
Operating System: Not tried

Description

Congratulations for creating a tool for accessing Cologne Digital Sanskrit Dictionaries. I am currently involved in maintaining CDSL data at github. I would be greatly interested in helping the frontend tools which use CDSL data.

I am interested in knowing which data you use for downloading / accessing the data when someone invokes the package. How frequently do you plan to update the data? And how?

What I Did

Just asked a question.

hrishikeshrt commented 2 years ago

Namaste,

Thank you. I am using the SQLite databases available from the web interface. The download links from the main CDSL page are used. Data is accessed when someone tries to access the dictionary for the first time or calls the update command. It uses "Last modified" information on the download page to decide whether to download data again or not.

drdhaval2785 commented 2 years ago

If you want data to be kept updated without much hassle, you can use data available at https://github.com/sanskrit-lexicon/csl-json/tree/main/ashtadhyayi.com. It is in JSON format. www.ashtadhyayi.com uses it for frontend. This would have additional facility to the user to see the scanned page and also view the scanned page from dictionary entry itself.

The structure is simple. It gives an idea for every headword and from that ID you can search the dictionary entry.

hrishikeshrt commented 2 years ago

Thank you for pointing me in that direction! I will definitely consider using that, it might actually reduce several dependencies.