syncthing / syncthing

Open Source Continuous File Synchronization
https://syncthing.net/
Mozilla Public License 2.0
65.39k stars 4.3k forks source link

Usage reporting data storage improvements #9212

Open calmh opened 12 months ago

calmh commented 12 months ago

We receive quite a few usage reports from Syncthing instances. This data is good and valuable. However, our way of handling it is naive and inefficient, a relic from when there were a few thousand Syncthing users and not upwards of a million.

Currently, reports are stored in JSON format in a PostgreSQL database. There are a few aggregation tables where per-day aggregated data is stored, but several queries run against the latest 24 hours of raw reports. There are several issues with this:

If I did it from scratch today, I would probably have done something more like:

Thus, we would keep a tiny fraction of the data, and most queries could be answered by loading a single daily object or time series object. Those only change every 24 hours so can be cached in RAM. All the server instances are essentially stateless, just putting or occasionally getting objects from an object store (e.g., S3).

bt90 commented 12 months ago

TimescaleDB might solve some of these problems without having to rebuild everything from scratch.