Open almaslov opened 1 year ago
Hi @almaslov , thanks for doing this effort and contribution and taking the time to explain your solution/feature to this issue. Would it be possible for you to create a PR so we can review it a little bit in more detail and possibly merge a solution upstream?
I was too optimistic about
public interface of message is not changed
It is not changed indeed, but costs me a questionable workaround. There are two inconsistent points:
__dict__
field should be generated), so using __slots__
field is a reasonable solution.__slots__
to preserve public api.Workaround is described in comment to metaclass new method.
In addition to the above I patched current implementation instead of making side one.
I've tried the same approach for serialization and unexpectedly performance boost is quite large too. In case of the same message TestArrayComplex it is 16,5 seconds vs 1,4 seconds for 50K serializations.
Patch is in PR already.
@almaslov thanks for pushing this! I'll be taking a look 👍
Feature request
Hello! As already mentioned in other discussions python implementation has issues in serialization and deserialization performance(e.g. here). I focus on deserialization here. I've tried some fixes on
galactic
branch and they look very promising for me: depending on message structure and size it's about 3-20 times faster. x3 for simple messages, up to x20 for complex types with high nesting level and array fields.Feature description
The main idea is to avoid pure python code in message deserialization. To achieve it, every message is split into C extension base class and pure python derived class. C extension class contains all message fields, so deserialization method initializes C structure members without calling python methods. On the other hand pure python class contains all dunder methods and assertions.
Implementation considerations
Here is an example of how generated files look like for simple message file Example.msg:
_example_base.c extension module defines base class containing members of native types in lines 10-25.
Python module _example.py is the same as in upstream except for base class(line 62) and microoptimization of assertion in
__init__
method(line 95).Support extension module _example_s.c now uses specific subtype of PyObject(line 147), calls
__new__
instead of__init__
to prevent pure python calls(line 170), directly initializes C typed fields of message(lines 188, 191) and creates nested collections (line 193-212).Benchmark for this message which runs deserialization routine 1 million times gives
28 seconds
vs6,5 seconds
In case of more complex message like
TestArrayComplex
(from this issue) with 100 nestedTestElement
elements, the same benchmark gives57,6 seconds
vs5,5 seconds
for 50K deserializations.I run benchmark with python3.8, ROS galactic, ubuntu 20.04, AMD 2,85GHz.
Pros:
Cons:
__init__
and*_convert_to_py
methods.I'm interested if this solution worth porting from my local
galactic
branch to upstream. Or may be it violates some ROS architecture principles.