skyfielders / python-skyfield

Elegant astronomy for Python
MIT License
1.41k stars 211 forks source link

On the fly downloading from NASA servers on an "as needed" basis #74

Closed sarang-gupta closed 6 years ago

sarang-gupta commented 8 years ago

Downloading entire ephemerides takes up a lot of time and space. As it turns out, NASA's servers support HTTP/1.1 byte range requests. For example:

curl -o test.txt -r 120000000-130000000 ftp://ssd.jpl.nasa.gov/pub/eph/planets/ascii/de431/ascm06000.431

yields 10,000,001 bytes from the specified position in ascm06000.431.

Since these files (and the xsp and bsp files) are very well structured, it should be possible to pull (and cache) data on an as-needed basis, instead of having to download entire ephemerides.

This would be particularly useful when computing non-Terran satellite positions, whose ephemerides can be very large.

wenzul commented 8 years ago

A question without much background: How do we know which byte range we have to download? Could we know that without any file content knowlege or is there any index like header?

sarang-gupta commented 8 years ago

@wenzul The files are very well structured, so, yes, it's possible to know what byte ranges to download without needing the entire file. Shipping skyfield with small index files is a good idea. The CSPICE libraries (and others) use file seeking to access the large ephemeris files, so using byte ranges (the HTTP equivalent of file seeking) should work well too.

tritium21 commented 8 years ago

I have a concern about this.... without the ok of someone in IT over at JPL, distributing a library that does random access to someone else's servers MIGHT not be the most neighborly thing in the world to do. This is different than just downloading a huge binary blob once - they obviously are OK with that, they published it. What they might not be ok with is the death by a thousand papercuts of being hit in a fairly hard to cache way every time a user of some library restarts their application.

So baring someone with a jpl.nasa.gov email address chiming in, if this is something to be done, ensure it uses aggressive local caching at least.

wenzul commented 8 years ago

I dont't know your use case like calculation ranges. But if there will be such a solution there should be kind of pre-load and permanent caching of time ranges in the beginning. Otherwise you will end up as an unintended DoS originator. Does skyfield need all parts of the ephemeris at once? If yes I think 300MB like a filepart for a specific time frame is a good tradeoff regarding usage and complexity. I realize I lack expertise. Now I'm an exited listener. :)

tritium21 commented 8 years ago

My concern is only a moral one.

Does my code run infrequently (once a day) and use only a few hundred bytes of data from JPL per run? Then I have no problem fetching that data from JPL every time my code runs.

Does my code run on 1000 computers, frequently, and each run uses a few thousand bytes from JPL? If my storage requirements mean I cannot have the entire ephemris on hand, you better believe that I would want to cache those bytes locally.

I am proposing that the assumption made when adding this feature be the latter, not the former. Default to being a good neighbor ... its not like this data changes frequently (or at all).

brandon-rhodes commented 8 years ago

Thank you for your comments and ideas!

The SPICE library and its tools, according to the documentation, do include code for taking a large ephemeris and excerpting it by stating up front that the user only needs observations between certain limited dates.

So I think that we should add to Skyfield a tool that would take a large .bsp ephemeris and produce a smaller one that was also a self-contained .bsp file but that covered a more limited range of dates. The output would be a new file, not a live ephemeris object that the user could use, so the temptation would hopefully not be there to run it more than once and keep asking the server for the same range over and over again.

@sarang-gupta — would the ability to build a more limited ephemeris file that worked over a more limited range of dates satisfy your use case?

ckuethe commented 8 years ago

Maybe use de432s? It's got a limited date range and is fairly compact. https://naif.jpl.nasa.gov/pub/naif/generic_kernels/spk/planets/

sarang-gupta commented 6 years ago

@brandon-rhodes

Apologies for the long delay in answering. I don't actually have a particular use case in mind, my concern was the actual bandwidth required to download files to compute, say, Ganymede's position. jup310.bsp is nearly a gigabyte in size, for example.

I noticed that, because SPK files are so well structured, you can read them with random access, not just sequential access, so I thought I'd found a clever workaround: download only the data you need and cache it locally.

I think the caching is what you have in mind when you say:

" a tool that would take a large .bsp ephemeris and produce a smaller one that was also a self-contained .bsp file but that covered a more limited range of dates. The output would be a new file, not a live ephemeris object that the user could use"

that's what I mean. I didn't mean using the live ephemeris for every computation, but didn't specify caching in my original request.

However, per the other comments above, this appears to be a bad idea. It's probably better to download the entire file instead of trying to be overly clever, especially since bandwidth is getting cheaper and faster all the time.

brandon-rhodes commented 6 years ago

I think it makes sense to provide a way to download only part of the file — and if we write the code to pull an excerpt from a bsp file in a generic enough way, then it will work equally well over HTTP, FTP, and against a local file. I'll keep this issue on my list for when I get time!

brandon-rhodes commented 6 years ago

I have just released jplephem 2.7 which supports creating excerpts of large ephemerides. You can either invoke its command line tool on a local file:

python -m jplephem excerpt 2018/1/1 2018/4/1 de421.bsp outjup.bsp

— or you can provide a URL and jplephem will use HTTP Range: requests to fetch only the blocks of the ephemeris that cover the date range:

python -m jplephem excerpt 2018/1/1 2018/4/1 https://naif.jpl.nasa.gov/pub/naif/generic_kernels/spk/satellites/jup310.bsp excerpt.bsp

Enjoy!

astrojuanlu commented 6 years ago

This is awesome, thanks! 😍