Closed Bernmeister closed 2 years ago
If I print(comets)
your dataframe, it looks like it's indexed by integer:
perihelion_year ... reference
0 1997 ... MPC106342
1 2019 ... NK 1615
2 2018 ... MPC 75703
3 2019 ... NK 731
4 2025 ... MPC 75704
.. ... ... ...
946 2025 ... MPEC 2022-HC2
947 2022 ... MPEC 2022-GJ3
948 2022 ... MPEC 2022-L66
949 2017 ... MPC107687
950 2019 ... MPEC 2020-K19
The index of a Pandas dataframe is the un-labeled column over to the left of all other columns. So this can be indexed like (to choose an arbitrary index shown above):
>>> print(comets.loc[4])
perihelion_year 2025
perihelion_month 12
perihelion_day 23.6383
perihelion_distance_au 3.30022
eccentricity 0.210298
argument_of_perihelion_degrees 161.715
longitude_of_ascending_node_degrees 285.296
inclination_degrees 5.0297
magnitude_g 13.5
magnitude_k 2
designation P/1999 XN120 (Catalina)
reference MPC 75704
Name: 4, dtype: object
The comets documentation includes a formula for re-indexing the dataframe by comet name:
comets = (comets.sort_values('reference')
.groupby('designation', as_index=False).last()
.set_index('designation', drop=False))
print(comets.loc['1P/Halley'])
See whether that maneuver also, on your machine, results in a dataframe that when printed shows the comet name over at the left as the index value.
I completely missed the section in the examples:
comets = (comets.sort_values('reference')
.groupby('designation', as_index=False).last()
.set_index('designation', drop=False))
All good and thank you; humble apologies and hanging my head in shame for this blunder!
I suspect that the documentation buried the point of that (fairly tangled) line of code a bit too deeply. The next time I'm in that part of the docs, I'll see if I can reformat the comments around it to draw the eye a little more emphatically!
One thing which has bothered me is the difference in which the dataframes between comets and minor planets are treated (prior to being calculated). This is entirely due to my inexperience of pandas/numpy which I'd not met until I started my journey from PyEpehem to Skyfield. Through sheer brute force, I am overly familiar with the file formats for each of comets and minor planets from the Minor Planet Center and have found issues with data for example which you've highlighted in the Skyfield documentation (for example #449).
In the documentation I'm comparing, rightly or wrongly, the comets and minor planets in terms of writing Python. I guess coming from PyEphem, I'm used to one function which takes the comet or minor planet data file and spits out the answer. I have screened (XEphem formatted) data to remove lines containing '****' (#503) and missing the absolute magnitude component.
If at some stage you will put the Kepler orbit documentation page back on on the blocks, some questions which you might consider are:
print( len( comets ), 'comets loaded' )
different to printing minor planets, print( minor_planets.shape[ 0 ], 'minor planets loaded' )
?designation
in minor_planets = minor_planets.set_index( 'designation', drop=False )
referred to mpc._MPCORB_COLUMNS
. When I see these things now it is obvious; perhaps a one liner (either a comment in the example or an explicit line of text) mentioning the reference to the column names and where to find them?CometEls.txt
) corresponded to one orbit and if that is the case, I've not seen more than one line per comet. Does (should) this also apply then for minor planets? Further to this point, in the code comets = ( comets.sort_values( 'reference' ).groupby( 'designation', as_index = False ).last().set_index( 'designation', drop = False ) )
, why sort by reference
rather than designation
?
EDIT: I've since understood the logic behind "most recent orbit". Take for example comet 332P (Ikeya-Murakami). This comet comprises several fragments, each with its own entry in the data file. I can now see why you sought to eliminate all but the most recent "orbit", although, I'm unsure why you used the 'reference' field to sort rather than say 'designation' and take the first (as it might/should be the brightest).While your guess about the ‘most recent orbit’ logic is a good one, I think the actual trigger for that was a September 2020 episode in which comet NEOWISE had, temporarily, two entries in the comets data file, as mentioned in this comment:
https://github.com/skyfielders/python-skyfield/issues/449
Since we have never yet observed the MPC accidentally adding two entries for the same minor planet to the minor-planets file, the Skyfield example code doesn't have a workaround for it. Maybe someday we could move to having a one-sized-fits-all approach where all data files get the same treatment. But in the interests of simplest-possible (and thus fastest-possible) code, the approach so far has been to apply fixes to only the files that have proved historically to have needed those fixes at least once in their history.
Unsure if this is related to #707 but I've not run my comet code since then and noticed today I'm getting a
KeyError
when referring to a comet name. Here is sample code:I get
KeyError: '1P/Halley'
at the line when extracting therow
.I have run
pip3 install --upgrade jplephem numpy pandas skyfield
and those packages are at version:Running on
Ubuntu 20.04
.