first minor release is going to be solely focussed on performance improvements. As well as implementing patches to improve performance providing benchmarks with the documentation will also be very useful.
We aim to end up with a benchmarking test suite that can be utilised in an on-going basis to ensure new features don't have an impact on performance.
[x] Pipe() is currently defined as an object. This is extremely hot code and defining these as function will likely yield an significant speed up
[x] Pipeline().run could be optimised. Potentially look at using slots to only expose input_pipes etc and have a simple function capable of running pipelines
[x] marshal_extra_outputs is slow. called around 4 times per pipeline
[x] get_mapper is called a huge number of times. As this does a lookup for the mapper by name from the _REGISTRY this is ripe for improvements.
first minor release is going to be solely focussed on performance improvements. As well as implementing patches to improve performance providing benchmarks with the documentation will also be very useful.
We aim to end up with a benchmarking test suite that can be utilised in an on-going basis to ensure new features don't have an impact on performance.
A user on hacker news ran some benchmarks here https://voidfiles.github.io/python-serialization-benchmark/ they seem pretty solid as a foundation for implementing our own suite.
Todo