Closed bj0 closed 6 years ago
Thanks for the report! I will try to get to this before PyCon, but it might instead be after the conference that I have time to tackle it (though, of course, anyone else is free to try to create a fix in the meantime). I assume that I can reproduce this error using the default hip_main.dat.gz
file whose URL is at the top of the hipparcos.py
file?
Probably, I didn't change anything so it's just downloading it. I deleted it and let it download it again but it didn't change anything.
I have downloaded the file and there was no error when Skyfield parsed it. Is it possible, since the link to the file isn’t an https link but instead just a plain-text non-error-corrected protocol, that there is a bad byte in your copy of the file that happened as it was downloaded? If you run the following code, you can find out if our two copies of the file are exactly the same:
from hashlib import sha256
content = open('hip_main.dat.gz', 'rb').read()
print(sha256(content).hexdigest())
My version prints:
96fe641a7112f41893f6632d91762179c7f3cf072a16d2beff62e37d5b3db65c
I lost the files I had been using before, but I re-created the script and it re-downloaded the hip_main file. It's hash matches yours and it still produces the error on star 421.
Here's the script that causes it:
from skyfield.data.hipparcos import load
stars = []
for star in load(lambda x: True):
# print(star)
stars.append(star)
print(stars)
oops, hit wrong button
I see — you are trying to load every star in the database, even though there are stars without valid fields defined for decimal RA and Dec. I had only ever tried to load specific stars, never all of them at once.
If an attempt is made to load a start without decimal RA and Dec, I wonder what should happen?
I'm not sure, an error saying what the actual problem is might be useful, but if the RA/Dec are there in HMS/DMS, then I don't know why they aren't there in DD. The conversion is pretty simple, and why would they be in the catalog if they are invalid?
In a fit of whimsy, I did perl -nle '$n++; print "$n ",substr($_,87,8)' hip_main.dat
(after uncompressing hip_main.dat.gz) and found what I expected: star 421 is the first star for which that field, ra_mas_per_year
, is blank. This code casts the field to float: https://github.com/skyfielders/python-skyfield/blob/master/skyfield/data/hipparcos.py#L16
Probable fix: Check all fields for empty string and set to NULL as needed.
It won't come out in a Skyfield release for a few more days, but for those of you working from master
, check out the load_dataframe()
method in hipparcos.py
. It needs a bit of cleanup (it still prints the number of seconds required to load the catalog, for example!), but it loads the entire Hipparcos catalog for me in about 1 second.
As you'll see from my heavily in-progress work in charting.py
, the combination of Pandas and NumPy means that you can compute the coordinates of the entire sky full of stars with:
c = catalog
c = c[c['magnitude'] <= limiting_magnitude]
c = c.sort_values('magnitude', ascending=False)
s = Star(ra_hours=c.ra_hours, dec_degrees=c.dec_degrees)
spos = o.observe(s)
While I hesitate to ever ask folks to install additional dependencies, I think that Pandas is such a huge improvement in how to handle thousands of stars over earlier approaches involving separate Star
objects that I might think of making this standard. Also, it's so much more fun to let Pandas do the parsing of the text file instead of doing it by hand! All I had to type was:
names, colspecs = zip(
('hip', (2, 14)),
('magnitude', (41, 46)),
('ra_degrees', (51, 63)),
('dec_degrees', (64, 76)),
)
—and Pandas did everything else!
I'm going to close this for now, since it looks like the Pandas approach will work well — feel free to open further issues as you run into problems or new ideas, though!
In a few minutes I'll be releasing Skyfield 1.7 with the documentation on Stars rewritten to highlight the ability to load the Hipparcos library from a Pandas dataframe. Hopefully we'll get feedback on how the feature works for real users!
The default value for hipparcos.URL (ftp://cdsarc.u-strasbg.fr/cats/I/239/hip_main.dat.gz) appears to be offline. I've had good luck switching that to http://cdsarc.u-strasbg.fr/ftp/cats/I/239/hip_main.dat.gz
I just tried to import the stars using
skyfield.data.hipparcos.load
. It downloads the file and then loads the first 420 stars, but throws ValueError on star 421. Looking at the data file some of the fields are blank for that star, which skyfield doesn't handle.