Speed difference between Skyfield and PyEphem for comets / minor planets

Bernmeister commented 3 years ago

When computing comets (and minor planets) with Skyfield, the time to do so is astronomically (pardon the pun) greater than that for PyEphem. I understand PyEphem uses a C backend and so will always be faster, but this appears to be orders of magnitude faster. Hoping this is a quick fix and I've made yet another school boy error!

The test script below computes comets and minor planets for each of Skyfield and PyEphem:

#!/usr/bin/env python3

# Download and save 
#     https://www.minorplanetcenter.net/iau/Ephemerides/Comets/Soft00Cmt.txt
#     https://www.minorplanetcenter.net/iau/Ephemerides/Comets/Soft03Cmt.txt
#     https://www.minorplanetcenter.net/iau/Ephemerides/Bright/2018/Soft00Bright.txt
#     https://www.minorplanetcenter.net/iau/Ephemerides/Bright/2018/Soft03Bright.txt

import datetime, ephem, importlib, io, skyfield.api, skyfield.constants, skyfield.data.mpc

def testSkyfield( orbitalElementData, utcNow, latitude, longitude, elevation, isComet ):
    timeScale = skyfield.api.load.timescale( builtin = True )
    t = timeScale.utc( utcNow.year, utcNow.month, utcNow.day, utcNow.hour, utcNow.minute, utcNow.second )
    ephemeris = skyfield.api.load( "de421.bsp" )
    sun = ephemeris[ "sun" ]
    earth = ephemeris[ "earth" ]
    topos = skyfield.api.Topos( latitude_degrees = latitude, longitude_degrees = longitude, elevation_m = elevation )
    alt, az, earthSunDistance = ( earth + topos ).at( t ).observe( sun ).apparent().altaz()

    with io.BytesIO() as f:
        for orbitalElement in orbitalElementData:
            f.write( ( orbitalElement + '\n' ).encode() )

        f.seek( 0 )

        if isComet:
            dataframe = skyfield.data.mpc.load_comets_dataframe( f ).set_index( "designation", drop = False )
            orbitCalculationFunction = "comet_orbit"

        else:
            dataframe = skyfield.data.mpc.load_mpcorb_dataframe( f ).set_index( "designation", drop = False )
            orbitCalculationFunction = "mpcorb_orbit"

            # Remove bad data: https://github.com/skyfielders/python-skyfield/issues/449#issuecomment-694159517
            dataframe = dataframe[ ~dataframe.semimajor_axis_au.isnull() ]

    for name, row in dataframe.iterrows():
        body = sun + getattr( importlib.import_module( "skyfield.data.mpc" ), orbitCalculationFunction )( row, timeScale, skyfield.constants.GM_SUN_Pitjeva_2005_km3_s2 )
        ra, dec, earthBodyDistance = ( earth + topos ).at( t ).observe( body ).radec()
        ra, dec, sunBodyDistance = sun.at( t ).observe( body ).radec()

def testPyEphem( orbitalElementData, utcNow, latitude, longitude, elevation ):
    for line in orbitalElementData:
        if not line.startswith( "#" ):
            observer = ephem.Observer() # Not sure if the observer should be reset on each run or not...
            observer.date = ephem.Date( utcNow )
            observer.lat = str( latitude )
            observer.lon = str( longitude )
            observer.elev = elevation
            comet = ephem.readdb( line )
            comet.compute( observer )

utcNow = datetime.datetime.strptime( "2020-11-24", "%Y-%m-%d" ) # Set to the date of the data files.

latitude = -33
longitude = 151
elevation = 100

# Skyfield COMET
t = datetime.datetime.utcnow()

with open( "Soft00Cmt.txt" ) as f:
    orbitalElementData = f.readlines()

testSkyfield( orbitalElementData, utcNow, latitude, longitude, elevation, isComet = True )
print( "Skyfield COMET:", skyfield.__version__, '\n', "Duration:", datetime.datetime.utcnow() - t, '\n' )

# PyEphem COMET
t = datetime.datetime.utcnow()

with open( "Soft03Cmt.txt" ) as f:
    orbitalElementData = f.readlines()

testPyEphem( orbitalElementData, utcNow, latitude, longitude, elevation )
print( "PyEphem COMET:", ephem.__version__, '\n', "Duration:", datetime.datetime.utcnow() - t, '\n' )

# Skyfield MINOR PLANET
t = datetime.datetime.utcnow()

with open( "Soft00Bright.txt" ) as f:
    orbitalElementData = f.readlines()

testSkyfield( orbitalElementData, utcNow, latitude, longitude, elevation, isComet = False )
print( "Skyfield MINOR PLANET:", skyfield.__version__, '\n', "Duration:", datetime.datetime.utcnow() - t, '\n' )

# PyEphem MINOR PLANET
t = datetime.datetime.utcnow()

with open( "Soft03Bright.txt" ) as f:
    orbitalElementData = f.readlines()

testPyEphem( orbitalElementData, utcNow, latitude, longitude, elevation )
print( "PyEphem MINOR PLANET:", ephem.__version__, '\n', "Duration:", datetime.datetime.utcnow() - t, '\n' )

The results:

Skyfield COMET: 1.33 
 Duration: 0:02:21.199370 

PyEphem COMET: 3.7.6.0 
 Duration: 0:00:00.003490 

Skyfield MINOR PLANET: 1.33 
 Duration: 0:00:07.016223 

PyEphem MINOR PLANET: 3.7.6.0 
 Duration: 0:00:00.000338

Some points to note:

The bizarre way in which I load the data for Skyfield from file only to write back out to a memory file (or whatever it is called) mimics my final application as I need to currently support PyEphem and Skyfield and at the same time, cache the data files so as to not annoy the Minor Planet Center. Regardless, I've timed the file processing code versus the computation part and it is the computation part which is expensive.
Using importlib.import_module to dynamically invoke the correct function call again (seemingly) incurs little to no overhead to the overall timing.
Computing the observer on each iteration for PyEphem makes little impact to timing (but PyEphem is not the slow horse in this race).
I chose the smallest sample of data file Bright for the Minor Planet test. Ultimately I want to use data files for each of Critical, Distant and Unusual.

brandon-rhodes commented 3 years ago

Thanks very much for the very complete example script! It worked the first time I tried it, and let me dive in immediately.

You have discovered a secret about PyEphem: it does no actual calculations when you call compute(). Originally it did, but it turns out most users only ask for one or two facts about an object, not for everything, so to speed things up it just caches the (a) observer and (b) date, then access to actual attributes trigger computation. So try adding a line comet.a_ra, comet.a_dec, comet.hlon, comet.hlat right after the compute() call. It’s still pretty quick, but not as quick as when PyEphem does no work.
I tried profiling the code with cProfile with:

python -m cProfile -o _profile.out tmp.py

And then dumped the resulting profile to the screen:

python -c '__import__("pstats").Stats("_profile.out").strip_dirs().sort_stats("tottime").print_stats()'

It looks like much of the time is spent in the functions stumpff() and propagate():

Wed Nov 25 17:03:03 2020    _profile.out

         1196835 function calls (1186925 primitive calls) in 4.390 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     8095    1.179    0.000    2.372    0.000 keplerlib.py:401(stumpff)
      148    0.495    0.003    3.445    0.023 keplerlib.py:445(propagate)
    74441    0.438    0.000    0.438    0.000 {method 'reduce' of 'numpy.ufunc' 
objects}
     7947    0.211    0.000    2.540    0.000 keplerlib.py:507(kepler)
    24285    0.203    0.000    0.399    0.000 shape_base.py:1069(tile)
...

@JoshPaterson — If I recall, you are the expert on those routines. Do you have any ideas about how they spend most of their time? Or about what directions we might explore to see if they could be increased in speed?

For comparison, I think PyEphem does some trigonometry instead of trying to propagate directly with vectors:

https://github.com/brandon-rhodes/pyephem/blob/master/libastro-3.7.7/twobody.c

mkbrewer commented 3 years ago

Hah! I'm glad I looked at this issue. I had an application a couple of weeks ago that needed the previous antitransit for the Sun. We have an ephemeris class that wraps PyEphem. This class has a public method get_object('Sun') that calls a private method that returns ephem.Sun(). After doing this it calls compute(). I looked at the code for PyEphem and saw that previous_antitransit() calls compute() again. I searched high and low to see what exactly compute() did to no avail, so I decided to access the private method to get ephem.Sun() in order to avoid calling compute() twice. Now I know that was unnecessary.

JoshPaterson commented 3 years ago

I'm a little surprised at how much time is spent in stumpff. It gets called so often because there's a root finding algorithm in propagate and stumpff is used in computing the objective function.

I think the best chance of getting orders of magnitude improvement in speed would be to vectorize propagate so that it can take a vector of objects and a vector of times and return a two dimensional array of positions.

I'll take a look at what can be optimized in stumpff and propagate (I already see some things), and I will also try to vectorize propagate.

Bernmeister commented 3 years ago

@JoshPaterson @brandon-rhodes Wondering (as politely as possible) how this is progressing? I'm somewhat stuck in PyEphem (which has served me well for many years), but am keen to finally move over to Skyfield.

As I understand, the solution is to take a vector of object and process in that manner? If so, would that then apply to stars and planets?

Bernmeister commented 2 years ago

I noticed over the last couple of weeks the Minor Planet Center seems to have dropped support for data files for minor planets for formats other than their native MPC format. That means users of PyEphem can no longer use MPC data (except for comets). I guess this makes Skyfield the front-runner as it were as Skyfield uses the MPC format. Any update please on how this work is proceeding to (hopefully) speed up processing of large numbers of comets / minor planets?

I did toy with the idea of taking data files in MPC format for minor planets and convert to PyEphem/XEphem format...but I don't believe all the data is present in the MPC format which is required in the XEphem format to do a successful conversion. Would have been a neat workaround...oh well.

ephilalethes commented 5 months ago

Any progress on this front? I'm trying to calculate some daily positions with comets and minor planets over long periods of time, and as it stands it's still astronomically (good pun indeed) slow.

skyfielders / python-skyfield

Speed difference between Skyfield and PyEphem for comets / minor planets #490