Closed brunobord closed 5 years ago
Here are some initial thoughts:
I have recently suspected that it was a poor choice for me to include auto-download in the load()
function. Maybe I should have just had folks grab the files on their own? The loading code does try to use file locks to prevent n threads from pulling n copies of the file, but, really, the best approach is to make sure the file is present before calling Skyfield.
Maybe I should work to re-expose Skyfield's logic as two pieces: one that would turn "I want DE421" into a URL, and then a second piece that loads the file from disk, and encourage people to do the download of the URL their own way?
In any case, my guess is that you'll want users to get the file themselves, or have a little script you offer them to grab it. None of your options is likely to be a good fit — # 1 and # 3 because they impose costs on PyPI, and # 2 because, as you note, downloading and storing a file somewhere on disk isn't the job of the setup script.
I'd give the application writer the job of getting the file. Give them the URL, and have them call your init function where they activate your library with a path to where they downloaded the file. It's generally trouble to try to guess how the top-level app will prefer to have I/O take place (for example: the network might not be available during setup.py, depending on whether they're in a container, and what its network settings are).
I would like to add some remarks as well:
As I do a larger application build on sky field, I need beside the de421.bsp file some more. Especially the time related files like Leap_Second, deltat parts and in addition as I would like to show some stars the hippacros catalogue as well. Finally the intension of using the skydield framework was to show satellite information. Even therefore additional data needed. So in total a lot of files needed to be downloaded.
Some of the data outdates over time (deltaT, leap second) after some months, the satellite data after some days. From my point there is no real way around downloading them in the application.
Here my path: Download was organized when first started the application. This took some time. After that I show outdating data to the user. He could decide if he wants to have expiring on or off, meaning you need initial connectivity for first install, but than could work also off-line as long it is acceptable for you.
I'm thinking of including a first set of files in the application. This increases the size, but it is anyway about 50MB, so still acceptable. I distribute it under releases on GitHub, this supports the size currently. To start with distributing the files, the license an other aspects have to be sorted out.
Michel
PyPI is not historically fond of Python packages with embedded binary data; it's not what their infrastructure is set up for, and is costly to them.
Granted, yes. But I'm also investigating about this and I've found that the package file size limit is allegedly 60MB. What would be necessary for workalendar to work is just about 17~18MB (de421.bsp is sufficient enough, IMO), so the package would weight way below the standard limit.
Oh, they have a size limit now? Then it makes perfect sense for you to distribute binary data as long as you stay within their guidelines. Take a look at load_bundled_npy()
in skyfield/functions.py
if you want to see how Skyfield itself access a distributed data file.
So it sounds like Option # 1 will be simplest? Or are you going to try Option # 3?
@brunobord In ticket #123 I gave a run down on how to make a subset of de421.bsp which may help.
I've started to work on a skyfield-data
package, but I've been disturbed by astronomical calculations.
In workalendar
, some asian calendars are using solar terms to determine holidays, and (py)ephem
does a great job about them.
Alas, with skyfield
, I haven't been able to have an iso-functional solar term function (and yes, I've struggled a lot).
I think I'll open a separate issue here, but if I can't replace ephem
, I'm afraid it'll slow down my "data" initiative.
@brunobord I am not familiar with solar terms, but if you would like to share sample PyEphem code that computes the value you need, I might be able to suggest how to do the same in Skyfield.
well... You'll see that the code I'm using is... yours :o)
Introducing: Skyfield Data!!!!
See the usage in the README file (and the project page on PyPI). The next improvement(s) will be:
.bsp
file and extract its expiration date, as for the others.But... it looks like it's working on my side, so that's great.
Closing... :o)
Thanks for the update!
Rationale
As a workalendar maintainer, I'd like to drop
ephem
from our requirements in favor ofskyfield
.Workalendar
is a Python library and toolkit to compute holidays for more than 230 calendars around the world. To determine some of these holidays, we need to know the date of the Spring Equinox, for example.But my main issue here is that the following snippet will take from several seconds to a few minutes at the first run, depending on your network speed:
Hopefully, at the subsequent runs, the corresponding files will be found on the disk and won't be downloaded, so it'll run much faster... until the files expire, so we'll be back for a download dance again.
And you're prone to various problems when relying on an online datasource: network can break, distant servers can be unaccessible, etc... When you're in the middle of running a program, this might be an issue.
I'm also wondering what would happen if we're in a multi-user environment (a web application?), what happens if several threads are requesting the same files? wouldn't they be downloaded several times, once for each thread?
Ideas / solutions
1 - Include the files in
workalendar's
packageThis has major drawbacks:
expire=False
flag in theLoader
, those files would expire, and an "old" install would compute astronomical events with a potential error margin.expire=True
flag in theLoader
class, obsolete files would be re-downloaded at runtime, causing a big latency, and probably a timeout for web applications.2 - Download the files at the setup step
Pros: the package archive would keep its modest size.
Cons: It would mean that our
setup.py
would include scripting. This is not necessarily a blocker here, but it's not apparently where the Python community is leaning towards, since "modern" projects prefer to use a declarativesetup.cfg
.3 - Build a
workalendar-data
packageCons:
Pros:
workalendar
package would still keep its modest size,workalendar
, they would just upgrade theworkalendar-data
in their install.workalendar-data
would generate false computations, it would mean that in workalendarsetup.py
file we would have to put something likeworkalendar-data>=2022.1.1
to ensure that they're "safe". We would have a complete control over the release cycle and their eventual (in)compatibilities.Related issues
I have no idea (yet) of the license of these files, if I'm authorized to embed them in a Python package and publish them on PyPI.
If it's possible, I think that it could be an interesting opportunity to add this "data package" to the
skyfielders
organization, and instead of publishing aworkalendar-data
, why not having askyfield-data
package? I reckon it might be tedious, because there are a ton of USNO files, and it would be complicated to know where to stop adding data files.Conclusion
I'd like to know
skyfield
maintainer (@brandon-rhodes) opinion on this subject. As far as I can tell, I think that the 3rd solution seems the best, but it raises issues, so it seems not that simple.Clear skies!