skyfielders / python-skyfield

Elegant astronomy for Python
MIT License
1.38k stars 208 forks source link

[improvement/suggestion] Change default path for iokit.py::Loader #851

Open danieldjewell opened 1 year ago

danieldjewell commented 1 year ago

Firstly, I'm a huge fan of the project and as a 100% amateur astronomy/astrophysics nerd, many many kudos for making something that is incredibly complex very easy to understand :grin: !

I know that way back in #73 the option was added to change the download directory for iokit::Loader ... and the documentation for iokit::Loader is very clear about having the option to set the download/cache directory.

I'm wondering if it might not be a good idea to change the default behavior to use a standard cache directory instead of the current directory? The issue, of course, is that - by default - Skyfield will re-download new ephemeris files in every directory that it runs. This isn't terrible for de421.bsp at only 16MiB ... but definitely gets a little more out of control with some of the larger ones (de440 is ~115MiB).

I'm always reluctant to suggest adding dependencies, but the platformdirs package is cross-platform, compatible with multiple versions of Python, and (from my own use) follows the platform-specific "rules" for finding the appropriate paths to store user data. (i.e. It follows the XDG spec on *NIX platforms, and the equivalents on Windows/macOS). It's really quite convenient/nice.

That way, the default could be set to a repeatable/universal path - and would avoid re-downloading if someone has, say, multiple Jupyter notebooks in different directories. (Yes, one could argue that in those cases, the user should specify a directory using the Loader() class ... but, being realistic, how likely is that to happen?)

I don't mind working on this and putting a pull request in -- but didn't want to submit a pull request out-of-nowhere :wink: ...

Thoughts? :thinking:

brandon-rhodes commented 1 year ago

I don't think at this point that we can change the default, as that would surprise existing users and mean that Skyfield behavior changed as they updated Skyfield or rolled it back to an earlier version.

But I could imagine Skyfield supporting a simple config file. Maybe named skyfield.ini and looking like:

[load]
directory = ~/skyfield-data/
override = true

Where override specified whether Skyfield should use the given directory even in the case that a call to load() provided a path of its own.

That way users would have to opt-in to Skyfield changing behavior, and wouldn't be surprised when their old scripts suddenly did something different. Would that solve your use-case?

danieldjewell commented 1 year ago

I don't think at this point that we can change the default, as that would surprise existing users and mean that Skyfield behavior changed as they updated Skyfield or rolled it back to an earlier version.

Yeah....... I see your point. :face_exhaling: Unfortunately, at this stage, changing the default would probably create more problems...

But I could imagine Skyfield supporting a simple config file.

This could work - perhaps with multiple paths to check? Something along the lines of how other software checks for config files (in order):

  1. The current directory (would naming the file as "hidden" be beneficial? i.e. .skyfield.ini in the current directory?)
  2. A user-specific config directory (~/.config/skyfield/skyfield.ini)
  3. A system-wide config directory (dunno about a good path here - something maybe in /etc?)
  4. A default config

For a config file, the question then becomes go with an INI-style syntax (and, presumably, parse with configparser or with a JSON file -- both are built-in Python modules. One notable advantage of using configparser is it does support comments (which would be nice for a self-documenting config file...)

Also, what's your preference on writing a default config file when a config file isn't found? (One frustration I've had over the years is either having to hunt for a "sample" config file or having to create one completely by hand... An example of doing this well might be PostgreSQL, PHP, or JupyterLab ... they either install or make it easy to create a "default" self-documenting config file with all [or nearly all] of the options listed and documented...)

Perhaps something default that mimics the current default configuration like:


#Skyfield Configuration File

#Section - load
#This section contains configuration options for where Skyfield will look for ephemerides, etc. 
[load]
#directory: [path] Skyfield will search this path for ephemerides before attempting to download from JPL/NAIF; Skyfield will also store downloaded ephemerides in this path
directory = . 
#override: [boolean - true/false] Set this to "true" in order to override using the directory where Skyfield runs to search/store ephemerides 
override = false
#disable_download: [boolean - true/false] Set this to "true" to disable Skyfield from automatically downloading ephemerides files 
disable_download = false

EDIT: Side note, might not be too difficult to allow multiple paths to be specified - just like the $PATH environment variable...

EDIT2: Speaking of Environment Variables - I suppose that could also be an option, although I (personally) would prefer for that to be an -additional- option and not the only one ... Setting environment variables can be a little tricky.... Here I go again with scope creep :exploding_head:

The main motivation is/was to (a) improve efficiency (save disk space, bandwidth) by not re-downloading [potentially huge] ephemerides and (b) improve overall performance. Having it default would be nice ... only because it makes the configuration one less step... Although, as long as the configuration file option(s) are clearly documented, that seems like it would work? (I have run into far too much software that either (1) doesn't document stuff like this or (2) buries the config documentation... But including/creating a self-documenting config file as mentioned above would definitely solve this I think.)

An additional benefit of the config-file approach would be having the ability to specify a shared directory that is a local mirror of the JPL ephemerides - thinking about people who might use Skyfield in an academic setting on a big cluster (or who have limited/intermittent network connectivity)... They could pre-download the BSP kernels they need/want and then configure Skyfield to look in that path. [Note/Thought - Should probably add a "fallback" and/or "no download" option for the case where the path provided is not writable... i.e. if someone tries to open de440.bsp and it's not available, either fallback to writing to the current directory or abort.]

Now that I think of it (and at the risk of scope creep - something I fight against all the time), having the ability to specify an HTTP(S) mirror server (local or otherwise) could be an additional item that could be added later? The config file idea would provide a nice place to store that... (But that's an issue/topic for another day.)