vgreg / MeatPy

Market Empirical Analysis Toolbox for Python
BSD 3-Clause "New" or "Revised" License
20 stars 5 forks source link

Improve ITCH parser and writer #8

Open vgreg opened 5 months ago

vgreg commented 5 months ago

See if we can speedup parser.

vgreg commented 5 months ago

Potential ideas for speedup:

New improved output formats:

The markdown output is for interactive work and for the CLI

The JSON output is to simplify development and debugging.

The arrow format is to be able to store historical files in parquet format for easy searching and extraction. That way, when we want to look at a subset of stocks on a given day, we can easily query the messages related to those symbol/days and process them.

vgreg commented 5 months ago

The overall parser architecture should be overhauled. The current approach is highly inefficient as it forces to store all messages in memory.

The more modern way to read large files like this would be to use a generator that can do automatic filtering: https://realpython.com/introduction-to-python-generators/

It would also decouple two important aspects of the message parser: reading and writing. The "in-memory" representation is currently at the message level, but the code around it is very messy. We could have many readers (one for each file type, at the minimum a binary ITCH reader, but potentially also parquet, JSON, etc...)

We could also have many writers, one for each file type.

The formatting logic could be defined at the message level.