Decompressing compressed hypertable rows outside of timescaledb

l85m commented 4 years ago

We are using timescaledb both in the cloud and on edge devices. On the edge we are very constrained on device resources (CPU, disk, and memory) as well as bandwidth for devices which connect over LTE.

It would be very nice if Timescaledb allowed us to select compressed rows on the device and pass these to an application which can upload them to the cloud where we can then use a script to decompress the data before passing on for further processing.

This provides maximum flexibility for how to send the data (including diffing strategy and transmission protocols) as well as how the data can be processed upon receipt. However it does also require coordination so that changes to timescaledb's compression format does not break the decompression script.

As this solves a hard problem for us we would be open to working on a script and contributing it back to the community. I am opening this ticket to discuss viability of this concept and if viable determine best method to proceed.

mfreed commented 4 years ago

Hi @l85m thanks for the request.

Just to clarify, are you looking for embedded functionality on Timescale Cloud to accept compressed rows, or is it just a method for the database running on your edge devices to return a compressed row to the application for uploading? (I believe the latter, but checking.)

And because of CPU constraints, you don't want to decompress in TimescaleDB then recompress on device before uploading via cellular connection?

Can you saw more about how data is uploaded (e.g., if in batches)? As our compressed "rows" are actually like columnar arrays, there isn't a 1:1 correspondence between compressed rows and uncompressed rows. More like 1:1000 typically.

l85m commented 4 years ago

Hi @mfreed - Thanks for the quick reply! We talked to a few people on your awesome support/dev team about this tonight actually, so they may be able to provide additional context.

Just to clarify, are you looking for embedded functionality on Timescale Cloud to accept compressed rows, or is it just a method for the database running on your edge devices to return a compressed row to the application for uploading? (I believe the latter, but checking.)

Correct, this is about the latter. However, we do run on timescale cloud so ultimately these rows will end up in our timescale cloud instance as well as with other consumers in AWS.

And because of CPU constraints, you don't want to decompress in TimescaleDB then recompress on device before uploading via cellular connection?

Exactly, also because we are disk limited we may not have enough space to decompress and recompress. Additionally, timescaledb compresses our data super well - so it would be really nice to just use that and only need to implement decompression.

Can you saw more about how data is uploaded (e.g., if in batches)? As our compressed "rows" are actually like columnar arrays, there isn't a 1:1 correspondence between compressed rows and uncompressed rows. More like 1:1000 typically.

We send two types of data to the cloud, one type is basically a state snapshot which we send on a regular frequent interval. It is not very big so we don't care too much about compression and it gives us a relatively "live" view of the device state.

The other type of data is what we call telemetry and it is what we are focused on in this issue. Telemetry includes all changes to the state snapshot and it can get large fast (50-2000 data points per second depending on the application) and based on our experience it compresses very well in timescaledb. We typically don't care if telemetry data is delayed so we can also batch for minutes/hours/days before sending if it reduces bandwidth consumption.

timescale / timescaledb

Decompressing compressed hypertable rows outside of timescaledb #2071