timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
17.62k stars 883 forks source link

Continuous Benchmarks #12

Closed sirinath closed 7 years ago

sirinath commented 7 years ago

Can you release benchmarks against:

As part of you testing and CI so regressions are know and addressed sooner.

mfreed commented 7 years ago

Hi Sirinath,

We have released benchmarks against Postgres and run them internally as part of CI testing; you can find these results in our whitepaper. We are working on other benchmarks that make sense, which itself requires some careful design.

For example, the TPC benchmarks were not designed specifically for time-series in mind, while the ones used by many columnar databases only emphasize queries for which they are optimized (e.g., simple roll-ups) and not the richer queries for which Timescale is designed. Some of the above systems are also in-memory databases; those probably won't be our initial comparison set.

Regards, --mike

sirinath commented 7 years ago

Maybe you can try to optimise so performance will be greater than in memory DB when all data fits into memory little updates. But to make sense in practical use you have ti support high volume of inserts.

Also related: https://github.com/timescale/timescaledb/issues/11

mfreed commented 7 years ago

Timescale is already optimized to keep newly inserted data in memory, through its chunk-based architecture.

That's why and how it continues to achieve constant insert performance even when the DB gets larger (i.e., ~140K rows of 10 metrics / sec, or 1.4M metrics/sec), while Postgres' hits a performance cliff at 10s-100s of millions of rows.

image

For more information/explanation about this graph see the whitepaper, particularly page 5-6.

sirinath commented 7 years ago

Would it be realistic to think achieve 15m ops in the near future?

Any way the biggest consumer of a TS DB would be FS. FS is obsessed with speed and latency.

Also ability to have in DB data connectors also might help speed up things. Reading processing and storage all are in the DB in the same process.

mfreed commented 7 years ago

The above benchmarks are from a single machine. Timescale's architecture is designed to scale out linearly with the number of servers, so we should be able to handle much higher aggregate throughput across the system than what a single node supports.

Now, regarding financial applications: There are different tools for different jobs. There are companies that push high-frequency trading into FPGAs so they can achieve even sub-microsecond latencies. TimescaleDB (or even any in-memory user-space DB) is not designed for those applications. Further, if all a company is doing is real-time stream processing, an in-memory stream processing framework is probably more appropriate (Apache Apex, Storm, Flink, Spark Streaming, PipelineDB, etc.).

Timescale more generally supports time-oriented queries against both new (real-time) and historical data, but in this generality (and by actually building on a storage engine which gives both durability and indexing), gives up some performance against systems designed only for real-time one-pass stream processing.

Regarding data connectors: You can connect to TimescaleDB like any Postgres database (JDBC, ODBC, Postgres), so if your system speaks Postgres, you can speak to TimescaleDB. We imagine that some users wanting stream processing might run their data first through a stream processing engine, then output the data to TimescaleDB.