jyotisham / jyotisha

Python tools for the astronomical / astrological vedAnga of Hindus
MIT License
88 stars 52 forks source link

Better festival database structure. #17

Closed vvasuki closed 3 years ago

vvasuki commented 6 years ago

Desiderata

  1. Festival data should be version controlled.
  2. It should be easy to peruse and edit online.
    • So, it should not be in one or two giant files.
  3. Where possible one should be able to look up a (non-relative) festival in a constant-time operation just by using one of the following event specs:
    • the lunar month + day.
    • the solar month + day
    • some other planetary transit or conjunction.
  4. Festival data shouldn't need to be all stored in the memory (which may be limited) when the program is running (either as a web service or as a script)

Current status

Currently we're using giant files - especially https://github.com/sanskrit-coders/jyotisha/blob/master/jyotisha/panchangam/data/festival_rules.json .

Drawbacks

This violates [3], [4] and [2] to varying degrees.

Proposed improvement

Let's have this festival data structure -

@karthikraman - किमभिप्रैसि?

karthikraman commented 6 years ago

I completely agree the current structure is very ordinary (some previous versions were even more :)). However, I don't immediately see the advantage of the new structure (except the memory part)—especially because we will have some 500+ files in this arrangement. I would like to split the festivals into reasonable sized files, perhaps by tag (e.g. kanchi aradhana days), or month and so on, for convenience.

Let's brainstorm some ideas. Perhaps, the choice should be made after carefully studying the code (which I presume you've already considered) and how a new data structure may positively impact. I've been meaning to profile the code for a while. All these ideas are ideal for a 2.0 version, which should begin with completely decoupling computations, perhaps festivals as well, and panchanga TeXing.

vvasuki commented 6 years ago

I don't immediately see the advantage of the new structure (except the memory part)—especially because we will have some 500+ files in this arrangement.

And that memory disadvantage is a big one. Festival data can grow arbitrarily big (given the descriptions - which I envision having rich details in various languages right in the database and the sheer number of festivals- I envision including various grAma-devatA-utsava-s, finer shrauta and smArta events like sthAlIpAka, AgrayaNa etc..).

The possibility of having 500+ files is not a big disadvantage - provided that they are properly organized and seek-time is O(1) - both for the consuming program and for the contributing human. We should take a step back and look at a more abstract level - that of events, rather than files - which is a particular representation. We ALREADY have 500+ events (which happen to have been dumped into a single file). Just adding a new festival (corresponding to a particular event) involves the contributor doing some crude mental binary search (where should I add it? Is it already there? etc..). This is a hassle, which the new system will do away with.

I would like to split the festivals into reasonable sized files, perhaps by tag (e.g. kanchi aradhana days), or month and so on, for convenience.

Splitting by tag is something I would dissuade - same old sequential search problem. Splitting by month is better - but I would still prefer splitting by day given the sheer number of events of note. As a parallel - In jyotiSha package, I have found myself wanting to split huge files with many many lines of code into smaller files before going and doing some (often minor) change I desire.

Let's brainstorm some ideas. Perhaps, the choice should be made after carefully studying the code (which I presume you've already considered) and how a new data structure may positively impact.

Indeed, I got this idea even as I wanted to have a per-day panchanga API and found festival computation all mixed up with annual panchanga computation. The code should definitely change along with the data and functionality should not be broken - agree on that. That's why I began adding pytest yesterday so that we can setup continuous automated testing with travis.

All these ideas are ideal for a 2.0 version, which should begin with completely decoupling computations, perhaps festivals as well, and panchanga TeXing.

The changes should be done incrementally. Clearing out and making the festival data better is a major impediment to further plans I have (eg - a web service that produces ICS calendar given the location and inclusion + exclusion tags in the URL ). Separting out tex stuff is a big one too - but mostly orthogonal.

karthikraman commented 6 years ago

Agree with practically everything. I'm excited about the web service that produces an ICS! As an aside, one functionality I'd like there is tag-based selection of events to dump into the ICS.

For solar_month, we have nakshatra as well as tithi based festivals (and rarely, "day-based"). So for a start, can we split by solar and lunar months alone?

Testing would be great too --- I currently test by diffing the TeX file!!

vvasuki commented 6 years ago

As an aside, one functionality I'd like there is tag-based selection of events to dump into the ICS.

Indeed, अहमपि तथा चिन्तयन्नस्मि। tags_to_include, tags_to_exclude इति सूचने ऽस्मद्यन्त्रैस् स्वीकार्ये।

For solar_month, we have nakshatra as well as tithi based festivals (and rarely, "day-based"). So for a start, can we split by solar and lunar months alone?

Sure!

Testing would be great too --- I currently test by diffing the TeX file!!

I find it convenient to use the local analog of http://api.vedavaapi.org/jyotisha/jyotisha/docs#!/default/get_daily_calendar_handler for an almost-end-to-end test.

vvasuki commented 6 years ago

@karthikraman - Please spot check https://github.com/sanskrit-coders/jyotisha/tree/master/jyotisha/panchangam/temporal/festival/data and see if all is well.

karthikraman commented 6 years ago

Have to check carefully, but currently, I am unable to even check it out, as I work on Windows. Folder names like tata:naTarAjar An2i tirumaJcan2am are not permissible :( -- perhaps we can have ta as a folder name and migrate the files to ta\ta:naTarAjar An2i tirumaJcan2am?

vvasuki commented 6 years ago

Fixed the filenames - using __ instead of : - please recheck.

karthikraman commented 6 years ago

Able to check out now. However, the write_panchangam scripts seem broken. Will look carefully.

karthikraman commented 5 years ago

priority can also be moved inside of timing in the json...

karthikraman commented 5 years ago

aparahna --> aparaahna?

vvasuki commented 5 years ago

priority can also be moved inside of timing in the json...

What's the justification? it would be important to document it with the code which defines the expected json structure.

aparahna --> aparaahna?

Sounds good!

karthikraman commented 5 years ago

priority can also be moved inside of timing in the json...

What's the justification? it would be important to document it with the code which defines the expected json structure.

Sure. Priority is a part of timing -- if a particular tithi occurs on two days, priority of puurvaviddha says pick the first and paraviddha says pick the second...

vvasuki commented 5 years ago

Sure. Priority is a part of timing -- if a particular tithi occurs on two days, priority of puurvaviddha says pick the first and paraviddha says pick the second...

Ah got it - then changing the field to pick_puurvaviddha and the value to a boolean is far clearer and "self-documenting".

karthikraman commented 5 years ago

Very much; perhaps some comment can carry the term paraviddha as well, like pick puurvaviddha day rather than paraviddha so that people are aware of the common classification...

vvasuki commented 5 years ago

ओह् - तर्हि "pick_paraviddha_vs_puurvaviddha" इति नाम कर्तुं शक्यम्।

I like Don Knuth's "literate programming" approach - code should read like a book. Comments can come in the following forms, in order of decreasing preference:

(Comments outside of code are not that useful.)

karthikraman commented 5 years ago

Aside: any access to a library with this book: http://www.worldcat.org/title/indian-calendric-system/oclc/40418421

vvasuki commented 5 years ago

Aside: any access to a library with this book: http://www.worldcat.org/title/indian-calendric-system/oclc/40418421

Berkeley has it - will let you know if I can get hold of it.

vvasuki commented 5 years ago

We've requested the book and my wife may get it in a week or two (barring गर्भहेतुकविलम्बाः). I was curious if you don't have as good an interlibrary loan system to get the book in IITM (I expect my wife to be working in a similar place after next year - hence the question).

To add to https://github.com/sanskrit-coders/jyotisha/issues/17#issuecomment-445467813 , I was just reading up and resummarizing my approach to good coding and stumbled on these 4 point summary of "Clean Code" - one major point of improvement for you would be to have much smaller function and file sizes.

karthikraman commented 5 years ago

We have an excellent library, but this book weirdly doesn't seem to be available anywhere in India! Maybe I can try placing a request with our librarian and he may have some way of getting it -- didn't try.

Thanks for the clean code tip. Will really work on it. I never envisaged this code becoming this big and useful :) --- have to work on my skills!

karthikraman commented 5 years ago

"Comments in clean code are almost never needed." 👍