skyfielders / python-skyfield

Elegant astronomy for Python
MIT License
1.42k stars 212 forks source link

Minor Planet Center “CometEls.txt” has a broken row with asterisks for the year #503

Closed ajpmaclean closed 3 years ago

ajpmaclean commented 3 years ago

I am refering to comet_neowise_chart.py I am using Python 3.9.1 and line 34: comet = sun + mpc.comet_orbit(row, ts, GM_SUN) fails because the perihelion year is a string.

May I suggest adding the following fix, it just converts the perihelion_year from a string to an int.

    comets = (comets.sort_values('reference')
              .groupby('designation', as_index=False).last()
              .set_index('designation', drop=False))
    comets['perihelion_year'] = comets['perihelion_year'].astype(int)

Once this is done everything is Ok. By the way there are no issues using Python 3.9 so I think you can update this dot point Supports Python 2.6–2.7 and Python 3.3–3.5. in Skyfield to Supports Python 2.6–2.7 and Python 3.3–3.9.

brandon-rhodes commented 3 years ago

Thanks for noting that Skyfield was not advertising the current full range of Python versions; in the commit shown above, I have expanded its claims to 3.9.

I suspect the error you are getting is because the data file is broken, not because of a bug in Skyfield. I see these lines in the middle of the file — could you check your copy of the file for the same lines?

    PK20W010  2020 04  4.6014  5.287832  0.264917  264.8117  124.2561   10.7921  20201214  10.0  4.0  P/2020 W1 (Rankin)                                       MPEC 2020-WH0
    PK20X010  2020 08  1.3688  2.889447  0.359464  326.1842   56.0671   31.5493  20201214  13.0  4.0  P/2020 X1 (ATLAS)                                        MPEC 2020-X71
              **** 01  1       0.00      0.00        0         0         0                  9.0  4.0
    CK20X020  2020 11 15.8857  3.828518  0.766246  347.3564  105.4589   18.1916  20201214  10.6  4.0  C/2020 X2 (ATLAS)                                        MPEC 2020-XF6
0001P         1986 01 26.5836  0.604791  0.966160  111.3965   58.4586  162.2842  20201214   4.0  6.0  1P/Halley                                                 98, 1083

The Minor Planet Center’s documentation does not mention (that I have seen?) the possibility of a row with asterisks like that, and so no astronomy libraries would be prepared for them. I suggest emailing the MPC and asking if they can repair the file.

(Edit: if it's not a mistake, and they intend in the future to include asterisks there, then please ask if they could expand their documentation of the file format to explain what libraries like Skyfield should do with entries like that. Thanks!)

brandon-rhodes commented 3 years ago

@ajpmaclean — Oh, and, if you want to perform emergency repairs on the file, you can do so in memory after loading it, using standard Python. Here's how:

with load.open(mpc.COMET_URL) as f:
    data = f.read()

data = b''.join(line for line in data.splitlines(True)
                if b'****' not in line)

from io import BytesIO
comets = mpc.load_comets_dataframe(BytesIO(data))
ajpmaclean commented 3 years ago

Thankyou for looking into this, I have emailed MPC with the following text:

In the most recent version of CometEls.txt, line 480 is:
              **** 01  1       0.00      0.00        0         0         0
                9.0  4.0
1) Is this line supposed to be there?
2) If so are the asterisks in the date field intentional?
Background:
The asterisks here cause the year data when loaded into Skyfield to be treated
as strings and not as integers.  Please see:
https://github.com/skyfielders/python-skyfield/issues/503#issuecomment-745273546
If it is not a mistake, then, could you please expand yourdocumentation of the
file format to explain what libraries like Skyfield should do with entries like
that.
Thanks
   Andrew

From looking at MPC Status Page they are aware of asterisk issues. However I suspect this file may have been overlooked.

ajpmaclean commented 3 years ago

@ajpmaclean — Oh, and, if you want to perform emergency repairs on the file, you can do so in memory after loading it, using standard Python. Here's how:

with load.open(mpc.COMET_URL) as f:
    data = f.read()

data = b''.join(line for line in data.splitlines(True)
                if b'****' not in line)

from io import BytesIO
comets = mpc.load_comets_dataframe(BytesIO(data))

Thanks, a better approach than my suggestion!

ajpmaclean commented 3 years ago

I just received notification that this is fixed. I also verified it.

brandon-rhodes commented 3 years ago

Thank you for following up on that, on behalf of the rest of us! I'm glad to hear it's not a permanent change to the file format.