timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
17.88k stars 882 forks source link

Insert performance slow with single-row inserts #1216

Closed thongpv87 closed 3 years ago

thongpv87 commented 5 years ago

I have a this table:

  CREATE TABLE log(
         seq SERIAL NOT NULL,
     ts TIMESTAMPTZ NOT NULL,
         type VARCHAR(5) NOT NULL,
         field1 VARCHAR(32) NOT NULL,
         field2 VARCHAR(32) NOT NULL,
     field3 VARCHAR(18) NOT NULL,
     field4 SMALLINT NOT NULL,
         field5 BIGINT NOT NULL,
         field6 BIGINT NOT NULL,
         field7 VARCHAR(3),
         field8 VARCHAR(3),
         field9 DOUBLE PRECISION,
         field10 TEXT,

     PRIMARY KEY(seq, ts)
  );
SELECT create_hypertable('log', 'ts');

and data is inserted from a c++ client (use taopq - https://github.com/taocpp/taopq as postgres library):

fmt::format(R"--(INSERT INTO {}(ts, type, field1, field2, field3, field4,  field5,  field6,  field7,  field8,  field9,  field10)
                        VALUES ('{}', '{}', '{}', '{}', '{}', {}, {}, {}, '{}', '{}', {}, '{}'))--",
                        sourceId,
                    e.ts, type, e. field1, e. field2, e. field3, e. field4,
                    e. field5, e. field6, e. field7, e. field8, e. field9, e. field10);

There is no other index, trigger,.....

I tried to insert 300 000 record which was parsed from a text file (the parsing performance is really fast, it only take 1-3s). The insert performance is only 300 records/s.

I used docker image latest-pg11 to run the timescale server with default config.

What was wrong? It is first time I tried timescaledb.

AymanElarian commented 5 years ago

any update ? i face the same problem

mfreed commented 5 years ago

Are you inserting a single row per INSERT, or batching? You'll see like >10x faster ingest rate with some batching (100s/1000s of rows per batch). That said, 300 records/s is painfully slow; we typically would still do like 10K+ rows per second with single node. What type of indexes did you specify? Anything more about your setup?

thongpv87 commented 5 years ago
  1. Yes, I do single row per insert
  2. I doesn't specify any index, just create this table then insert

update: I update my code to use batching update, then the insert performance is 10x faster with 100 rows/insert and 50x faster with 1000 rows per insert.

But when I run this test with postgres, the performance is the same. I mean the problem is because of batching insert, and I can not see the advantage of timescale in this case.

mfreed commented 5 years ago

@thongpv3 Vanilla Postgres' performance there will drop off as the table grows large, Timescale's won't. Additionally, there are a large number of query optimizations, analytical capabilities, automation, timeseries-data management, etc. that is unique to Timescale:

See some graphs here about larger inserts, while our docs and blog will go into more details about capabilities: https://blog.timescale.com/timescale-microsoft-azure-team-up-to-power-iot-and-time-series-workloads/

erimatnor commented 3 years ago

For best performance, please use COPY.

steve-numeus commented 2 years ago

@erimatnor please could you elaborate on why and how to use COPY instead of INSERT?