Open virtuald opened 8 years ago
Given how much time is spent importing the hal, I suspect a lot of time is spent in ctypes overhead. Creating a Python C module shim .so might speed this up considerably.
If one were to go that route, I'd want to use cython to generate that I think.
Is it possible that cffi would be be faster than ctypes for hal?
Wouldn't cffi require compilation of some kind?
The 2017 HAL + Python 3.6 is significantly faster:
admin@roboRIO-2423-FRC:~# time python3 -c 'import hal'
real 0m2.089s
user 0m0.940s
sys 0m0.170s
admin@roboRIO-2423-FRC:~# time python3 -c 'import wpilib'
real 0m4.119s
user 0m3.570s
sys 0m0.220s
But pip is bigger:
admin@roboRIO-2423-FRC:~# time python3 -c 'pass'
real 0m0.196s
user 0m0.190s
sys 0m0.000s
admin@roboRIO-2423-FRC:~# time python3 -c 'import pip'
real 0m4.159s
user 0m3.900s
sys 0m0.240s
Well. HAL seems to be faster this year. But wpilib is about the same... I imagine it has to be due to all of the imports we're doing. We could probably do better without importing everything... maybe some kind of lazy importing?
admin@roboRIO-2423-FRC:~# time python3 -c 'import hal'
real 0m1.202s
user 0m0.860s
sys 0m0.310s
admin@roboRIO-2423-FRC:~# time python3 -c 'import wpilib'
real 0m4.028s
user 0m3.600s
sys 0m0.390s
Also, I pushed a compiled version of cffi to the -dev repo, if someone wants to experiment with that.
I'd be interested to see exactly how much of the HAL is actually just ctypes overhead and how much is initialising the HAL.
I might try my hand on lazy-loading wpilib.
Update: I gave up on lazy-loading wpilib in robotpy/robotpy-wpilib#468. If anyone else has good ideas for that…
I've had some luck using cffi for HAL bindings after running HAL.h through the C preprocessor. Someone should actually do it and benchmark it.
Faster this year still:
lvuser@roboRIO-4774-FRC:~$ time python3 -c 'import hal'
real 0m0.953s
user 0m0.800s
sys 0m0.140s
lvuser@roboRIO-4774-FRC:~$ time python3 -c 'import wpilib'
real 0m3.016s
user 0m2.760s
sys 0m0.230s
lvuser@roboRIO-4774-FRC:~$ time python3 -c 'pass'
real 0m0.154s
user 0m0.120s
sys 0m0.030s
lvuser@roboRIO-4774-FRC:~$ time python3 -m pip > /dev/null
real 0m3.334s
user 0m3.090s
sys 0m0.220s
But no pyc files are compiled with -O, so we get this:
lvuser@roboRIO-4774-FRC:~$ time python3 -O -c 'import wpilib'
real 0m4.261s
user 0m3.920s
sys 0m0.310s
Here's HAL for this year, looks like it's a bit slower:
time python3 -c 'import hal'
real 0m0.978s
user 0m0.637s
sys 0m0.259s
Full results:
lvuser@roboRIO-2423-FRC:~$ time python3 -c 'import hal'
real 0m0.981s
user 0m0.637s
sys 0m0.262s
lvuser@roboRIO-2423-FRC:~$ time python3 -c 'import wpilib'
real 0m2.606s
user 0m2.213s
sys 0m0.504s
lvuser@roboRIO-2423-FRC:~$ time python3 -c 'pass'
real 0m0.154s
user 0m0.121s
sys 0m0.032s
lvuser@roboRIO-2423-FRC:~$ time python3 -m pip > /dev/null
real 0m3.927s
user 0m3.625s
sys 0m0.281s
It has occurred to me that a large chunk of the import time is probably pybind11 creating docstrings. Would probably be good if that could get deferred until they're actually accessed.
(On that note, do we actually need to have pybind11 bindings to the HAL functions?)
Looks like we're in the same ballpark this year:
lvuser@roboRIO-2423-FRC:~$ time python3 -c 'import hal'
real 0m1.041s
user 0m0.708s
sys 0m0.227s
lvuser@roboRIO-2423-FRC:~$ time python3 -c 'import wpilib'
real 0m2.874s
user 0m2.482s
sys 0m0.531s
lvuser@roboRIO-2423-FRC:~$ time python3 -c 'pass'
real 0m0.160s
user 0m0.148s
sys 0m0.011s
lvuser@roboRIO-2423-FRC:~$ time python3 -m pip > /dev/null
real 0m2.115s
user 0m1.953s
sys 0m0.150s
It has occurred to me that a large chunk of the import time is probably pybind11 creating docstrings. Would probably be good if that could get deferred until they're actually accessed.
It turns out that nanobind defers docstring rendering, where pybind11 doesn't: https://nanobind.readthedocs.io/en/latest/why.html#major-additions
Unfortunately I think we're stuck with pybind11 whilst we still need multiple inheritance: https://nanobind.readthedocs.io/en/latest/porting.html#removed-features
On my RoboRIO with a 2016 image, I get the following (mostly consistent timing between runs, no DS connected):
HAL library loading may be a big chunk of this:
As a baseline:
It would be good to reduce the startup time somehow.