skyfielders / python-skyfield

Elegant astronomy for Python
MIT License
1.41k stars 211 forks source link

different number of comets from mpc with Loader vs load #407

Closed davidmikolas closed 4 years ago

davidmikolas commented 4 years ago

When I load comets from mpc like this:

from skyfield.api import load
from skyfield.data import mpc
with load.open(mpc.COMET_URL) as f:
    comets = mpc.load_comets_dataframe(f)
comets = comets.set_index('designation', drop=False)
print(len(comets))

I get 917 comets. But when use this loader:

from skyfield.api import Loader
from skyfield.data import mpc
load = Loader('~/Documents/fishing/SkyData')  # avoids multiple copies of large files
with load.open(mpc.COMET_URL) as f:
    comets = mpc.load_comets_dataframe(f)
comets = comets.set_index('designation', drop=False)
print(len(comets))

I get 864 comets!

Either way

row = comets.loc['C/2020 F3 (NEOWISE)']

is successful.

I'm not a Pandas user and so I don't know how to check the difference between these two, presumably there are more comets in one than the other, but I don't know why these methods produce different sets of comets to begin with.

Is there something wrong with using a Loader object to access mpc?

Here's my ugly script: https://astronomy.stackexchange.com/a/36901

brandon-rhodes commented 4 years ago

If you open the two copies of the comets file with an editor, do they have different numbers of lines? And: are the files identical or have any differences?

davidmikolas commented 4 years ago

I see, the text file created using skyfield.api.load() with 917 comets does have 917 lines, and the one using skyfield.api.Loader() with 864 does have 864 lines. The second, shorter text file was created July 10th though it's been modified (according to MacOS Finder) just now. The longer text file was created yesterday.

When I delete both text files run these two snippets again, both now have 917 comets.

For some reason the .Loader() method changes the modified date of the text file but does not add newer comets to it?

Could there have been 50 new comets added since July 10? Not likely.

Anyway after deleting the text files and re-running both now yield 917 comets. I'll try this exercise again in a week and see what happens.

I'll attach the two text files

CometEls 864 lines using Loader.txt CometEls 917 lines using load.txt

brandon-rhodes commented 4 years ago

Could there have been 50 new comets added since July 10? Not likely.

How did you estimate that likelihood? My guess is the opposite: that 50 comets have just been added to the file. :)

davidmikolas commented 4 years ago

I used the following: 50 out of 900 is roughly 5%, if the list grew by 5% in a day or two then it must only be a month or two old, but astronomers have been aggressively cataloging comets for decades.

Which implicit assumptions are most wrong here?

But I think the larger issue is why the existing shorter file was modified and yet not correctly updated when I ran the two snippets! Right now it seems one must delete the existing comet file before updating. Is this a bug?

I'll continue to keep an eye on this... and look for any further discrepancies between the two.

brandon-rhodes commented 4 years ago

Which implicit assumptions are most wrong here?

The assumption that CometEls.txt would grow linearly with time. I am instead assuming it grows in fits and spurts.

But I think the larger issue is why the existing shorter file was modified and yet not correctly updated when I ran the two snippets! Right now it seems one must delete the existing comet file before updating. Is this a bug?

The comets file does not have an explicit expiration date, so no automatic re-download is ever performed. It is, I suppose, in a gray area: it doesn’t have version numbers like NASA ephemerides, which lets a programmer use, say, DE421 and know that it’s never going to change or need to be updated; but at the same time the comets file is not like a leap-seconds file with an explicit expiration date, after which a new file needs to be fetched in case a new leap second has been scheduled.

In general automatic re-downloading in Skyfield has been a very poor user experience; it’s hard to use Skyfield to run a website, or on a spacecraft, if a script that has worked successfully a thousand times suddenly tries to download a file and dies with an exception on its thousand-and-first run because the leap second file is now too old.

So all the documentation now tries to encourage use of the built-in leap second data, even if it might get stale, because a missed leap second is typically a smaller disaster than a script suddenly and entirely breaking.

Given that I'm now actively discouraging Skyfield's one example of an auto-updated file, I am therefore not likely to consider adding new classes of auto-updated file — like the comets file. I think that having folks arrange to make their own decisions about when to refresh files makes more sense.

Note that deleting the file should not be necessary, as the loader routines provide a reload= parameter that can be set to True.

I am going to close this issue for now on the assumption that the comets database did get some new comets added, but feel free to continue discussion here anyway, and of course re-open if you discover that there was indeed some problem with Skyfield! Thanks.

davidmikolas commented 4 years ago

Okay I think I have a fuller appreciation of all of those. load.open(mpc.COMET_URL) gave me the impression that it would really go to a url, rather than only do it as a last resort.

I wonder if there is room for a short note in the Comets documentation mentioning that it will check first and only download if an existing file is absent?

Anyway I'm good now, thank you for your time and patience!

brandon-rhodes commented 4 years ago

I wonder if there is room for a short note in the Comets documentation mentioning that it will check first and only download if an existing file is absent?

Yes, there's room in the docs for that, and it's probably a suitable reminder to print wherever a user might be reading about load() for the first time. I'll see about expanding the docs for that later this week!