Performance improvement and numpy vectorization

wrobell commented 4 years ago

When I receive ADS-B message from dump1090 I am converting it to binary form with binascii.unhexlify. This is much more efficient to store than saving string version of messages.

To decode the messages, I am reading them from a database in binary form, convert to string using binascii.hexlify and then use pyModeS. But, internally, pyModeS converts a message to binary form again (this or other way).

It would be great if pyModeS API allowed to pass binary version of a message.

Taking a quick look at the API, it seems that single dispatch with separate type registration for str and bytes could do the trick?

junzis commented 4 years ago

string vs bytes, hex vs bin, I know this day will come :smiley:

I will think about this. Perhaps, this would be a major change for pyModeS 3.x in the future, since many dependency may break...

@xoolive, any thought?

xoolive commented 4 years ago

Indeed, it would be great to clarify the use of data structures as it would be beneficial at the performance level as well.

We tried some performance improvement with Cython in the speedup branch and saw that the choice of the most relevant structures to support was at stake.

The lessons to take from this shot, imho, are:

choose one structure for the core decoding (in current state, it can be confusing at some point...), and stick to it (maybe rather bytearray than bytes, but I am not sure I remember well);
provide a compiled version (Cython?) of the library on PyPI (have travis prepare and upload them when a new tag is added?), with a fallback to the historical pure Python for unsupported architectures;
provide a vanilla Python API to the core decoding functions supporting bytes, str, hex, binascii, just anything... no questions asked.

wrobell commented 4 years ago

There are a lot of projects using Cython (see uvloop or asyncpg for reference). I am not aware of any unsupported architectures. Personally, I am using it in a lot of my projects on x86_64 laptop and multiple ARM based Raspberry Pi (A, Zero, 3, 4). IMHO, I would stick just to Cython for certain functions and avoid code duplication.

(however, if you decide to provide two implementations, please let me know. I might have some optimization for cprNL in common.py)

junzis commented 4 years ago

For the moment, I merged only the Cythin code for the common package (from the speedup branch). Those are the functions that are called all the time. It is still more than twice faster than pure python with the benchmark.

In the currently pyModeS version 2.x, even with the optional Cython c_common module, all the inputs are still strings.

provide a vanilla Python API to the core decoding functions supporting bytes, str, hex, binascii, just anything... no questions asked.

I think this might be the most logical way to move forward. Maybe using a decorator function to check the inputs? I am not sure how much slower this is going to be...

wrobell commented 4 years ago

On the compatibility and supported inputs. Looking at the API, I think first parameter of every function is a message.

Therefore, singledispatch could be used to overload first parameter, i.e.

In [1]: from binascii import unhexlify                                                                                                                                                                                                         

In [2]: from functools import singledispatch                                                                                                                                                                                                   

In [3]: @singledispatch 
      : def decode(msg): 
      :     raise ValueError('Message type {} not supported'.format(type(msg))) 
      :                                                                                                                                                                                                                                        

In [6]: @decode.register(bytes) 
      : @decode.register(bytearray) 
      : def _decode_bin(msg): 
      :     print('decoding type {}'.format(type(msg))) 
      :                                                                                                                                                                                                                                        

In [7]: @decode.register(str) 
      : def _decode_str(msg): 
      :     decode(unhexlify(msg)) 
      :                                                                                                                                                                                                                                        

In [8]: decode(b'abcd')                                                                                                                                                                                                                        
decoding type <class 'bytes'>

In [9]: decode('abcd')   # will call _decode_str, then _decode_bin                                                                                                                                                                                                                        
decoding type <class 'bytes'>

In [10]: decode(bytearray(b'abcd'))                                                                                                                                                                                                            
decoding type <class 'bytearray'>

This should allow backward compatibility as well.

xoolive commented 4 years ago

Indeed, singledispatch would be a reasonable way to address this though I guess the biggest chunk of the work would be to clarify types in the core decoding functions.

Also I believe it is always reasonable to keep a pure Python implementation as a fallback option. Even if it is always possible to compile the whole package for any architecture/platform/python version, a person in charge of an open source side project (i.e. a real day job and not so much time to devote to support) may feel iffy about opening the Pandora box and trying to provide a compiled version of the library for every possible combination.

wrobell commented 4 years ago

Even now, the API can support both hex strings and bytes. Please see https://github.com/junzis/pyModeS/pull/67.

wrobell commented 3 years ago

If possible, I would like to restart the discussion about this topic.

Another argument in favour of binary message processing:

# ensure the optimized version of `df` function is imported
from pyModeS.c_common import df as pymodes_df
from binascii import unhexlify

def df_bin(msg):
    return min(24, (msg[0] & 0xf8) >> 3)

# string and binary version of the same message
msg = '904ca3a33219741c85465ae1b4c3'
msg_b = unhexlify(msg)

%timeit pymodes_df(msg)
718 ns ± 5.98 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each

%timeit df_bin(msg_b)
202 ns ± 2.79 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Over 3 times faster.

I am approaching 1 billion messages in my database. This would be almost 12 minutes vs 3.3 minutes to extract downlink format information.

xoolive commented 3 years ago

@wrobell don't give up! I agree too... 🤣

junzis commented 3 years ago

This has to be added as a decorator to all current functions if we are going down this path.

I am wondering if the unhexlify would slow the execution?

junzis commented 3 years ago

My another comment:

min(24, (msg[0] & 0xf8) >> 3)

is very hard to understand for people not familiar with bitwise operations.

This also means we have to change almost every function in the decoder. That's a very large project :fearful:

wrobell commented 3 years ago

Thanks to function overload we could always create optimized versions for different types of inputs - strings, bytes... and arrays of these, see below. Indeed, there is a lot of work, but you can always fall back on slower implementation until new version is provided. The real challenge is to design nice API.

I was going through pyModeS code and realized that the most efficient implementation should use numpy anyway. pyModeS depends on the library, but does not utilize its potential, IMHO.

I did some research, which lead me to implement this

https://gitlab.com/wrobell/vmodes/-/blob/master/vmodes/parser.py

The project includes a simplistic performance test script

$ PYTHONPATH=. ./scripts/perf-test
processing 6.0 million messages

vmodes: df: 0.02 sec, 2.66 ns per message
vmodes: typecode: 0.04 sec, 7.03 ns per message
vmodes: icao: 0.22 sec, 36.71 ns per message
vmodes: category: 0.04 sec, 5.93 ns per message
vmodes: callsign: 2.20 sec, 365.95 ns per message
vmodes total: 2.51 sec, 418.33 ns per message

pyModeS: df: 4.52 sec, 752.87 ns per message
pyModeS: typecode: 7.83 sec, 1305.10 ns per message
pyModeS: icao: 11.42 sec, 1903.81 ns per message
pyModeS: category: 8.81 sec, 1468.20 ns per message
vmodes: callsign: 15.47 sec, 2578.46 ns per message
pyModeS total: 48.05 sec, 8008.47 ns per message

The difference in processing of real life data is not as impressive, but still VModeS is 13 times faster (I will add appropriate scripts to above repo later).

I will close this ticket now. I am planning to explore ideas for high performance processing via VModeS project. If, at any stage, you decide that improving performance of pyModeS is an option, please let me know and I will try to contribute back (IMHO #67 is a starting point).

junzis commented 3 years ago

@wrobell, the performance looks quite promising. Would you be interested to work on a new pyModeS branch directly? This way, @xoolive and I can also help with integrating it with the rest of the code.

wrobell commented 3 years ago

I will try to create some initial pull request in few days.

I am still exploring few options around NumPy's API.

junzis / pyModeS

Performance improvement and numpy vectorization #61