streamingfast / firehose-core

Firehose Integrators Tool Kit (for `firehose-<chain>` maintainers)
Apache License 2.0
11 stars 10 forks source link

A lot of CPU is wasted zipping one-block files processing historical blocks #54

Open matthewdarwin opened 2 months ago

matthewdarwin commented 2 months ago

It seems like aa lot of CPU is wasted zipping one-block files. If you're running the RPC poller (or even instrumented binary) at historical blocks reader-node will create .dbin.zst one block files, and within a second the merger will then uncompress all these files.

In this scenario it would be better if the compress/uncompress step was just skipped.

maoueh commented 2 months ago

This make sense indeed.

matthewdarwin commented 2 months ago

Ok, I can ask @fschoell to work on it.

fschoell commented 2 months ago

@maoueh should be straightforward to add. I could add a --common-one-block-compression flag accepting either none or zstd (maybe also gzip?).

Not sure about the ideal way to handle the file extension though, as there is a bunch of places connecting to the one block storage. The easiest would be to just have it added by dstore automatically after this: https://github.com/streamingfast/dstore/blob/91345d4a31f280d8c432e595ee5af4b18e76f664/stores.go#L92-L94

That way the file extension would also always be matching the actual compression format which would probably be useful. This would break backwards compatibility though as we would be adding .zst automatically.