skyfielders / python-skyfield

Elegant astronomy for Python
MIT License
1.41k stars 211 forks source link

Handling EphemerisRangeError when observing a minor planet with missing data #449

Closed Bernmeister closed 3 years ago

Bernmeister commented 4 years ago

On running the test below:

#!/usr/bin/env python3

import datetime, io, skyfield.api, skyfield.constants, skyfield.data.mpc

print( "Skyfield:", skyfield.__version__ )

# Download and save  
# https://minorplanetcenter.net/iau/Ephemerides/Distant/Soft00Distant.txt
# and adjust the date to the download date.
now = datetime.datetime.strptime( "2020-09-16", "%Y-%m-%d" )

latitude = -33
longitude = 151
timeScale = skyfield.api.load.timescale( builtin = True )
t = timeScale.utc( now.year, now.month, now.day, now.hour, now.minute, now.second )
topos = skyfield.api.Topos( latitude_degrees = latitude, longitude_degrees = longitude )
ephemeris = skyfield.api.load( "de421.bsp" )
sun = ephemeris[ "sun" ]
earth = ephemeris[ "earth" ]
with open( "Soft00Distant.txt" ) as f:
   for line in f:
       with io.BytesIO( line.encode() ) as f:
           dataframe = skyfield.data.mpc.load_mpcorb_dataframe( f ).set_index( "designation", drop = False )
           name = line[ 166 : 194 ].strip()
           if "2012 DR30" in name:
               body = sun + skyfield.data.mpc.mpcorb_orbit( dataframe.loc[ name ], timeScale, skyfield.constants.GM_SUN_Pitjeva_2005_km3_s2 )
               ra, dec, sunBodyDistance = sun.at( t ).observe( body ).radec()
               ra, dec, earthBodyDistance = ( earth + topos ).at( t ).observe( body ).radec()

I get the following:

skyfield.errors.EphemerisRangeError: ephemeris segment only covers dates 1899-07-28 23:59:18Z through 2053-10-08 23:58:51Z UT

which occurs during both the sun and earth observation.

On looking at the downloaded data, there is a piece of data missing from one field when compared to lines before/after. I presume this missing field is causing the exception. Is this is something that should/could be handled within Skyfield, or rather I just wrap a try/except around the code?

brandon-rhodes commented 4 years ago

Try printing t.utc_jpl() so that we can find out the date on which you are trying to generate a Sun and Earth position. Thanks!

Bernmeister commented 4 years ago

Adding the above print results in:

A.D. 2020-Sep-16 00:00:00.0000 UT

brandon-rhodes commented 4 years ago

Thank you, that eliminates the first possibility (that it was so far in the past or future as to exceed the date range).

Adding:

               barycentric = body.at(t)
               print(barycentric.distance())

— just before the fatal call, I see:

nan au

So for the moment you could protect your code by checking whether the position was a not-a-number floating point value before proceeding.

And we can in the future improve Skyfield’s behavior here, I think. The routine that tries to back-date positions for light travel time, instead of reaching impossibly far back in time in the case of a nan, should just try producing a nan as its output. Let’s leave this issue open until we can make that adjustment!

brandon-rhodes commented 4 years ago

Well, how interesting. Because integers at the machine level cannot represent nan, it looks like when nan gets converted deep inside jplephem into an integer index into the .bsp file, it turns into the integer (drum roll—)

-9223372036854775808

Which then indexes outside of the bounds of the file and produces the error.

Looking at _correct_for_light_travel_time(), it will have to become more complicated if it’s to handle nan values that might appear in the middle of an array that's otherwise in good shape.

I do want Skyfield to become transparent with respect to nan's, so that a big vector with a few nan's inside doesn't cause any problems. But I'll have to think about this one a bit, and see if there's a way to tackle it in the light-backdating case that won't cause too much expense for the 99.99% case where none of the values are nan.

Bernmeister commented 4 years ago

Dunno if this adds to the diagnosis or not, but noticed these messages along with the exception:

/usr/local/lib/python3.6/dist-packages/jplephem/spk.py:227: RuntimeWarning: invalid value encountered in double_scalars index2, offset2 = divmod(tdb2 * S_PER_DAY, intlen)

/usr/local/lib/python3.6/dist-packages/jplephem/spk.py:228: RuntimeWarning: invalid value encountered in divmod index3, offset = divmod(offset1 + offset2, intlen)
brandon-rhodes commented 4 years ago

Dunno if this adds to the diagnosis or not, but noticed these messages along with the exception…

I see only one mention (!) of that on the Web, and it seems to be in exactly the same situation, where divmod() is given a nan:

https://github.com/numba/numba/pull/5619#issuecomment-658091806

So I think those messages are in line with our hypothesis that nan is among the values the SPK routine is being asked to deal with.

Is this is something that should/could be handled within Skyfield, or rather I just wrap a try/except around the code?

Here are two ways of detecting the situation for now. First, after generating the position, you could check whether it has nan values inside:

               from numpy import isnan
               print(isnan(barycentric.position.au).any())

This prints True for the problem position, and could help you detect it.

An even better approach might be to simply ignore orbits like this one with no semimajor_axis_au specified, as without any data on the size of the orbit, there is no way that a position can be computed. If you print(dataframe.loc[ name ]) in your code you will see the not-a-number value for that field. What if you loaded the file first, and removed rows without a semimajor axis?

with open("Soft00Distant.txt", 'rb') as f:
    dataframe = skyfield.data.mpc.load_mpcorb_dataframe( f ).set_index( "designation", drop = False )

dataframe = dataframe[~dataframe.semimajor_axis_au.isnull()]

for name, row in dataframe.iterrows():
    orbit = skyfield.data.mpc.mpcorb_orbit(row, timeScale, skyfield.constants.GM_SUN_Pitjeva_2005_km3_s2)
    # then, add to `sun` and so forth

Note that in your code as it stands, I'm not sure there's any purpose in doing a set_index() on the dataframe, as your dataframe only ever appears to have one row? Your script will already probably feel slow because you are running load_mpcorb_dataframe() on each line instead of one the whole file, and stopping to generate a one-item index for that one row will slow it down further, in a situation where you probably want performance because of the number of lines in the file. Try it out and let me know if it helps!

Bernmeister commented 4 years ago

I didn't realise that the file could be loaded into a dataframe in one go. Yes, I've found that comets and minor planets run an order of magnitude or two slower than PyEphem (and was going to post about a little later on). My use case is in two parts: first I download the MPC data files (one for comets and four for minor planets). For each file, I remove a comet/minor planet if it exceeds (apparent) magnitude of 15. I then write out a binary blob file containing what's left. Secondly I use the binary blob on each update of my software loop to read in this much smaller subset.

I presume what you have written above in the code sample (open the text file and load into dataframe) is what you refer to speeding things up, then I'll have a go at using that. Much appreciated!

davidmikolas commented 3 years ago

note: I'm not a developer

As of today at least, two comets cause different kinds problems

    row_neo = comets.loc['C/2020 F3 (NEOWISE)']
    row_oum = comets.loc['1I/`Oumuamua']

The reasons might be that:

Both of these show problems internally at mpc.comet_orbit() but depending on what you do, you find out that there is a problem in all kinds of different ways, sometimes a lot further down the road

Question: Would a thorough checking within mpc.comet_orbit() even if it just prints a warning, save a lot of time and energy?

--

1791 comets loaded
skyfield.__version__:  1.23

NEOWISE test 1 (this must be trapped, otherwise crash)
(<class 'ValueError'>, ValueError('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().'), <traceback object at 0x11af1bc48>)

NEOWISE test 2 this is interesting
(<class 'ValueError'>, ValueError('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().'), <traceback object at 0x11af1bb48>)

NEOWISE test 3 because the warning is different
/Users/david/anaconda3/lib/python3.7/site-packages/skyfield/keplerlib.py:256: RuntimeWarning: invalid value encountered in double_scalars
  return 2 * arctan(((1+e)/(1-e))**.5 * tan(E/2))

Oumuamua test 1 seems no problem

Oumuamua test 2 but there is
(<class 'AttributeError'>, AttributeError("'_KeplerOrbit' object has no attribute '_mu_km_s'"), <traceback object at 0x11af315c8>)

Oumuamua test 3 no sign of trouble here though we know there is...
Sum of 2 vectors:
 + Segment 'de421.bsp' 0 SOLAR SYSTEM BARYCENTER -> 10 SUN
 + <Geometric ICRS position and velocity at date t center=10 target=1I/`Oumuamua>

Oumuamua test 4 and it shows up with a different name
(<class 'AttributeError'>, AttributeError("'VectorSum' object has no attribute 'distance'"), <traceback object at 0x11a868ec8>)

comes from the following script

from skyfield.api import Loader
from skyfield.data import mpc
from skyfield.constants import GM_SUN_Pitjeva_2005_km3_s2 as GM_SUN
from skyfield.api import load
import skyfield, sys

loaddata = Loader('~/Documents/fishing/SkyData')  # avoids multiple copies of large files
ts = loaddata.timescale() # include builtin=True if you want to use older files (you may miss some leap-seconds)

eph = loaddata('de421.bsp')

earth, sun, moon = [eph[x] for x in ('earth', 'sun', 'moon')]

with load.open(mpc.COMET_URL) as f:
    comets = mpc.load_comets_dataframe(f)

comets = comets.set_index('designation', drop=False)

print(len(comets), 'comets loaded')

print('skyfield.__version__: ', skyfield.__version__)

row_neo = comets.loc['C/2020 F3 (NEOWISE)']
row_oum = comets.loc['1I/`Oumuamua']

print('\nNEOWISE test 1 (this must be trapped, otherwise crash)')
try:
    test = mpc.comet_orbit(row_neo, ts, GM_SUN)
except:
    print(sys.exc_info())

print('\nNEOWISE test 2 this is interesting')
try:
    comet = sun + mpc.comet_orbit(row_neo, ts, GM_SUN)
except:
    print(sys.exc_info())

print('\nNEOWISE test 3 because the warning is different')
comet = sun + mpc.comet_orbit(row_oum, ts, GM_SUN)

print('\nOumuamua test 1 seems no problem')
try:
    test = mpc.comet_orbit(row_oum, ts, GM_SUN)
except:
    print(sys.exc_info())

print('\nOumuamua test 2 but there is')
try:
    test = mpc.comet_orbit(row_oum, ts, GM_SUN)
    print(test)
except:
    print(sys.exc_info())

print('\nOumuamua test 3 no sign of trouble here though we know there is...')
try:
    bary = sun + mpc.comet_orbit(row_oum, ts, GM_SUN).at(ts.now())
    print(bary)
except:
    print(sys.exc_info())

print('\nOumuamua test 4 and it shows up with a different name')
try:
    bary = sun + mpc.comet_orbit(row_oum, ts, GM_SUN).at(ts.now())
    print(bary.distance())
except:
    print(sys.exc_info())
Bernmeister commented 3 years ago

@brandon-rhodes To give an update, your suggestion to use

dataframe = dataframe[ ~dataframe.semimajor_axis_au.isnull() ]

does indeed prevent the exception

skyfield.errors.EphemerisRangeError: ephemeris segment only covers dates 1899-07-28 23:59:18Z through 2053-10-08 23:58:51Z UT

from occurring. Happy for this ticket to be closed, unless you intended to weave some magic within Skyfield.

Also I did try your suggestion of loading the dataframe in once, but it didn't make that much difference to speed (but I'll tackle that in another ticket so that this can be closed at your discretion).

brandon-rhodes commented 3 years ago

@Bernmeister — Thanks for following up! I have updated the docs to describe filtering the dataframe to remove invalid orbits, and hopefully any users who miss that advice will be led here or to the docs when they search for the exception name.

Oh, and, for the record, I believe that this line — which is missing its semimajor axis — was the culprit from Soft00Distant.txt:

K12D30R  7.1   0.15 K194R   0.04515  195.57163  341.47709   77.98645  0.9909325  0.00001530                MPO434970   209   8 2000-2014 0.26 M-v 38h MPC        0000         2012 DR30