Open jurezakrajsek opened 6 months ago
I would guess that for P type orbit the packed designation should be nan but in this case it reads the fragment part.
Let's read down through the whole file to see how things look. This little command line:
cat CometEls.txt | awk '{print substr($0,1,24)}'
—shows that the file is cleanly split between lines with designations in one format, and a second section that uses another. Omitting long stretches of lines with the same format, here's what we see:
CJ95O010 1997 03 29
PJ96R020 2019 02 12
PJ98V24S 2018 01 19
...
CK22D020 2022 03 27
PK22E010 2022 10 10
CK22E020 2024 09 13
CK22E030 2023 01 12
CK22F010 2022 07 28
CK22F020 2022 03 23
CK22H010 2024 01 18
CK22J010 2022 02 16
0001P 1986 02 17
0002P 2023 10 20
0004P 2021 09 8
...
0050P 2024 05 12
0051P a 2022 10 3
0051P d 2022 10 1
0052P 2021 10 5
...
0441P 2025 09 7
0001I 2017 09 9
0002I 2019 12 8
The first thing we can note is that we can't, alas, use your suggestion of using the orbit type P
to decide between the two formats, because there are P
-type comets in both sections.
The second thing we note is that, as you correctly note, the same field is used sometimes for the (mostly blank) fragment number (I included comet 0051P specifically so we'd get to see an example of a case where the field wasn't blank), and sometimes it's used for a modern packed designation like the ones used for minor planets.
For reference, here's how the format is described on their website:
https://www.minorplanetcenter.net/iau/info/CometOrbitFormat.html
Columns F77 Use
1 - 4 i4 Periodic comet number
5 a1 Orbit type (generally `C', `P' or `D')
6 - 12 a7 Provisional designation (in [packed form](https://www.minorplanetcenter.net/iau/info/PackedDes.html))
Which makes it sound so simple! But over in their text description of the format, they describe two different ways the fields can be used, with the "packed form" designation instead being a fragment number:
https://www.minorplanetcenter.net/iau/info/PackedDes.html
So the problem here is that Pandas doesn't allow columns to have different names depending on their content. There's no way (at least that I know of?) to tell Pandas, "the third column of this file should be called 'fragment number' when there's a non-empty comet number in columns 1-4, but should be called 'packed designation' when columns 1-4 are blank."
So in my design for the column numbers, I tried to just stick with the "official" names of the columns given in the first link above.
Before wading into further details, let's step back a moment. When I wrote up the code to handle comments, I didn't pay attention to those first columns, because the lines of the file are now much longer than in the old days, and include the full text of the comet name way over to the right. In your example, we can just read the full name 323P-B/SOHO
from the big field way over to the right, without needing to wade into the fields on the left. I guess the issue you've encountered is that the full name does not include the fragment number, the way the MPC have designed that new full-name field?
So what you want is a way to find the fragment number. Hmm.
Maybe the Skyfield documentation on comets could include a little snippet showing how to get the fragment number? It might work something like this:
import io
from skyfield.data import mpc
for comet_orbit in [
# Three example lines from CometEls.txt of May 10 2022:
' CJ95O010 1997 03 29.6937 0.892238 0.994969 130.4267 282.8976 89.2092 20220510 -2.0 4.0 C/1995 O1 (Hale-Bopp) MPC106342',
' PJ98V24S 2018 01 19.1688 3.427137 0.242855 244.6040 159.0401 5.0266 20220510 13.0 2.0 P/1998 VS24 (LINEAR) MPC 75703',
'0343P 2017 01 29.8032 2.266196 0.584845 137.4257 257.1514 5.5878 20220510 14.0 4.0 343P/NEAT-LONEOS MPC109148',
# Your example line:
'0323P b 2025 12 16.3240 0.040025 0.986146 353.9756 323.4582 5.4670 20240331 26.0 4.0 323P-B/SOHO MPEC 2024-F21',
]:
orbit = io.BytesIO(bytes(comet_orbit, 'utf-8'))
c = mpc.load_comets_dataframe_slow(io.BufferedReader(orbit))
c = (c.groupby('designation', as_index=False).last()
.set_index('designation', drop=False))
row = c.iloc[0]
number = row.number
if number > 0:
fragment_number = row.designation_packed
else:
fragment_number = row.designation_packed[6].rstrip('0')
print('Orbit type {!r} Designation {!r} Fragment number {!r}'
.format(row.orbit_type, row.designation, fragment_number))
Hmm. There's a complication here. Because we're only giving one line at a time, Pandas doesn't have the full CometEls.txt
to examine, and has to guess about column data types based on only one example line. Thus the output of my little example:
Orbit type 'C' Designation 'C/1995 O1 (Hale-Bopp)' Fragment number ''
Orbit type 'P' Designation 'P/1998 VS24 (LINEAR)' Fragment number 'S'
Orbit type 'P' Designation '343P/NEAT-LONEOS' Fragment number nan
Orbit type 'P' Designation '323P-B/SOHO' Fragment number 'b'
— gets a fragment number of nan
, a floating point value, where we really want ''
the empty string, because Pandas is guessing that the blank column is a float rather than a string. I guess we could spruce up the import a bit to know more about types.
But, first, I'll stop here to hear your comments, to learn whether getting the fragment value like this would answer your purpose — or whether there's some other underlying goal?
Hi Brandon
Yes the format is really anoying and complicated :( The packed designation is very useful to use with astroquery.mpc to get ephemeris from the MPC website.
My idea was that for the numbered comets packed designation is constructed as f'{number}{orbit_tyxpe}' In case that a fragment is found in the designation_packed f'{number}{orbit_tyxpe}-{fragment_letter}'
For other comets the fragment is as I see always the last letter in the designation_packed if it is lower case
I have constructed a function that can be used and applied on the mpc.load_comets_dataframe_slow pandas dataframe. It overwrites the designation_packed for numbered comets and also adds a new column for fragment_letter might be useful in some cases. This way the designation_packed can always be used with astroquery.mpc
import io
import pandas as pd
import numpy as np
from skyfield.data import mpc
def parse_fragment_letter(df):
if df.number > 0:
if pd.isna(df.designation_packed):
df['fragment_letter'] = np.nan
df['designation_packed'] = f'{df.number:.0f}{df.orbit_type}'
else:
fragment_letter = df.designation_packed
df['fragment_letter'] = fragment_letter
df['designation_packed'] = f'{df.number:.0f}{df.orbit_type}-{fragment_letter.upper()}'
else:
if df.designation_packed[6].islower():
df['fragment_letter'] = df.designation_packed[6]
else:
df['fragment_letter'] = np.nan
return df
comet_df = pd.DataFrame()
for comet_orbit in [
' CJ95O010 1997 03 29.6937 0.892238 0.994969 130.4267 282.8976 89.2092 20220510 -2.0 4.0 C/1995 O1 (Hale-Bopp) MPC106342',
' PJ98V24S 2018 01 19.1688 3.427137 0.242855 244.6040 159.0401 5.0266 20220510 13.0 2.0 P/1998 VS24 (LINEAR) MPC 75703',
'0343P 2017 01 29.8032 2.266196 0.584845 137.4257 257.1514 5.5878 20220510 14.0 4.0 343P/NEAT-LONEOS MPC109148',
'0323P b 2025 12 16.3240 0.040025 0.986146 353.9756 323.4582 5.4670 20240331 26.0 4.0 323P-B/SOHO MPEC 2024-F21',
'0332P 2021 08 18.2949 1.576824 0.489303 152.3122 3.7703 9.3804 20230225 18.0 4.0 332P/Ikeya-Murakami MPEC 2016-K18',
'0332P a 2021 08 18.4161 1.577135 0.489483 152.3403 3.7861 9.3800 20210814 9.0 4.0 332P-A/Ikeya-Murakami MPEC 2016-K18',
'0332P b 2016 03 17.1650 1.572891 0.489707 152.3711 3.8028 9.3805 19.0 4.0 332P-B/Ikeya-Murakami MPEC',
' CK20P04a 2020 08 8.4682 0.090986 1.002140 177.5194 175.1827 26.3726 20230225 5.3 6.0 C/2020 P4-A (SOHO) MPEC 2020-QN1',
' CK20P04b 2020 08 8.3493 0.098737 1.000534 173.4621 171.4740 27.3278 20230225 0.0 6.0 C/2020 P4-B (SOHO) MPEC 2020-QN1'
]:
orbit = io.BytesIO(bytes(comet_orbit, 'utf-8'))
c = mpc.load_comets_dataframe_slow(io.BufferedReader(orbit))
c = (c.groupby('designation', as_index=False).last()
.set_index('designation', drop=False))
comet_df = pd.concat([comet_df, c])
comet_df = comet_df.apply(parse_fragment_letter, axis=1)
print(comet_df)
designation number orbit_type designation_packed perihelion_year ... perturbed_epoch_day magnitude_g magnitude_k reference fragment_letter
designation ...
C/1995 O1 (Hale-Bopp) C/1995 O1 (Hale-Bopp) NaN C J95O010 1997 ... 10.0 -2.0 4.0 MPC106342 NaN
P/1998 VS24 (LINEAR) P/1998 VS24 (LINEAR) NaN P J98V24S 2018 ... 10.0 13.0 2.0 MPC 75703 NaN
343P/NEAT-LONEOS 343P/NEAT-LONEOS 343.0 P 343P 2017 ... 10.0 14.0 4.0 MPC109148 NaN
323P-B/SOHO 323P-B/SOHO 323.0 P 323P-B 2025 ... 31.0 26.0 4.0 MPEC 2024 b
332P/Ikeya-Murakami 332P/Ikeya-Murakami 332.0 P 332P 2021 ... 25.0 18.0 4.0 MPEC 2016 NaN
332P-A/Ikeya-Murakami 332P-A/Ikeya-Murakami 332.0 P 332P-A 2021 ... 14.0 9.0 4.0 MPEC 2016 a
332P-B/Ikeya-Murakami 332P-B/Ikeya-Murakami 332.0 P 332P-B 2016 ... NaN 19.0 4.0 MPEC b
C/2020 P4-A (SOHO) C/2020 P4-A (SOHO) NaN C K20P04a 2020 ... 25.0 5.3 6.0 MPEC 2020 a
C/2020 P4-B (SOHO) C/2020 P4-B (SOHO) NaN C K20P04b 2020 ... 25.0 0.0 6.0 MPEC 2020 b
[9 rows x 19 columns]
Thanks for providing more example code!
I agree that it will be nice, either in the documentation or a provided function, for Skyfield to show folks how to handle the inconsistent way that MPC represents comet fragments. But I suspect I should finish out a few items that I'm already working on for Skyfield first, and then once a new release is out, return to this topic. I'll keep it open, though, and hopefully other folks who need your code can cut-and-paste it from here in the meantime. Thanks!
@brandon-rhodes Is there a way that I can help you with this function? We need to zero-padd the number before making the MPC packed designation for numbered comets. The comet number is stored zero-padded right-justified in columns 1-4. Column 5 usually contains "P"
Something like that:
df['designation_packed'] = f'{int(df.number):04d}{df.orbit_type}'
So packed designation for numbered comets would be something like: '0343P' or '0332P-A'
For astroquery a packed designation for numbered comets must contain the object type (example: '0343P'). If you send a query without object type (example: '0343') as object ID you get the ephemeris for asteroid 343 instead of comet 343P. For other comets both options work (example: 'J98V24S' or 'PJ98V24S').
I noticed that for some comets the packed designation is parsed wrong This is the MPC 1-line orbital elements beeing parsed
0323P b 2025 12 16.3240 0.040025 0.986146 353.9756 323.4582 5.4670 20240331 26.0 4.0 323P-B/SOHO MPEC 2024-F21
And the output of the mpc.load_comets_dataframe_slow(io.BufferedReader(orbit))
I would guess that for P type orbit the packed designation should be nan but in this case it reads the fragment part.
Here is a short script to reproduce: