Performance issues when using 10,000s of chunks

sspieser commented 6 years ago

Hello timescale!

As per our conf call, here are the problems we encounter regarding response time over partition increase. We are leveraging the opportunity of using Postgres + Timescale, as an alternative to Oracle Database for managing big data in an energy application (time series, energy business measures).

Methodology:

Create a new database with Timescale extension
Create a big table, insert data into it
Run queries
Add new empty partitions (or chunks)
Run the same queries

Note: for Portgres it is easy to create empty partition; with Timescale, it looks like we have to actually insert some data to create more chunks… so the new partitions are not exactly empty.

One of the query is:

BEGIN;
    insert into collfact (time, pointid, value, mesureid, version, insertiondate, iscurrent)
    values (timestamp '2018-04-11 00:00:00', 10, trunc(random() * 99999),
        1, 2, now(), '1');
    UPDATE collfact set iscurrent = '0'
    where time = timestamp '2018-04-11 00:00:00'
        and pointid = 10 and mesureid = 1 and version = 1;
COMMIT;

Which should INSERT 1 row, and UPDATE 1 single row…

Except for the first test data, the 3 others crashed:

ERROR:  out of memory
DÉTAIL : Failed on request of size 200.

In fact this is the message from Windows, under Linux we have something slightly different but seems to be due to the memory management / protection under Linux (see https://www.postgresql.org/docs/current/static/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT) :

la connexion au serveur a été coupée de façon inattendue
        Le serveur s'est peut-être arrêté anormalement avant ou durant le
        traitement de la requête.
La connexion au serveur a été perdue. Tentative de réinitialisation : Échec.
Durée : 543010,711 ms (09:03,011)
!> COMMIT;
Vous n'êtes pas connecté à une base de données.
!>

But the issue (statement failure) is still the same even if the system does not react the same way.

Other unexpected observations: the same SQL SELECT query response time increses while adding chunks:

Chunks #      |    5840   |    7301   |    11319  |   14971   
Time (sec.)   |      13   |      57   |       70  |      76

See attached files for the SQL statements we used.

So our concern here is that we expected response time stability; the dataset and number of chunks here is quite small (or even really small), wheras Timescale is supposed to address these kind of requirements very well… So do we make something the wrong weay?

Thanks in advance for your help, Regards, Sebastien

PS1: tests are running on a Linux server, 8 GB, 4 CPU, RAID 5 disks; Postgres a been set using pg Tune: see attache file "PG_CONF.txt"

PS 2: file "PGTS_INITDATA.TXT" to build the database, table structure and test data file "PGTS_SOMEQUERIES.TXT" contains the queries we run file "PGTS_MOREDATA.TXT" to add some more chunks PGTS_INITDATA.TXT PGTS_SOMEQUERIES.TXT PGTS_MOREDATA.TXT PG_CONF.txt

gumshoes commented 6 years ago

What is the output of SELECT version();?

LeeHampton commented 6 years ago

@sspieser , could you provide a little more detail around your chunk setup? Specifically, if you run chunk_relation_size_pretty('collfact'), what chunk sizes are you seeing? If you were creating the chunks with default values they may be larger than available memory.

sspieser commented 6 years ago

sorry I 'm off and will reply more deeply tomorrow... I do not know how to adjust chunk setup anyway, I just used this statement: SELECT create_hypertable('collfact', 'time', 'pointid', 16, chunk_time_interval => interval '1 day'); stay tuned for more details tomorrow :-)

mfreed commented 6 years ago

One initial reaction is that you have way too few max_locks_per_transaction, which is 300. You roughly want 2 * num_chunks. See this for a discussion:

http://docs.timescale.com/v0.9/getting-started/configuring#locks

sspieser commented 6 years ago

@gumshoes

SELECT version();
                                                 version
---------------------------------------------------------------------------------------------------------
 PostgreSQL 10.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16), 64-bit

@LeeHampton : it's a query returning 29216 rows ! Do you need the whole? Excerpt: (52590,"""_timescaledb_internal"".""_hyper_1_52590_chunk""","{time,pointid}","{""timestamp without time zone"",bigint}","{NULL,_timescaledb_internal.get_partition_hash}","{""['2017-12-29 01:00:00+01','2017- 12-30 01:00:00+01')"",""['1476394997','1610612724')""}","136 kB","232 kB",,"368 kB") (52591,"""_timescaledb_internal"".""_hyper_1_52591_chunk""","{time,pointid}","{""timestamp without time zone"",bigint}","{NULL,_timescaledb_internal.get_partition_hash}","{""['2017-12-30 01:00:00+01','2017- 12-31 01:00:00+01')"",""['1476394997','1610612724')""}","136 kB","232 kB",,"368 kB") (52592,"""_timescaledb_internal"".""_hyper_1_52592_chunk""","{time,pointid}","{""timestamp without time zone"",bigint}","{NULL,_timescaledb_internal.get_partition_hash}","{""['2017-12-31 01:00:00+01','2018- 01-01 01:00:00+01')"",""['1476394997','1610612724')""}","136 kB","232 kB",,"368 kB")

@mfreed I tried to play with this but then PG does not want to start.... in fact this always raises a FATAL error like this: FATAL: could not map anonymous shared memory: Cannot allocate memory HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory...

Currently I have 29216 chunks; so 60000 max lock per trans... but seems to allocate hundreds of GB of memory !!

2000 is the max number that works for my server; at 3000 I get:

To reduce the request size (currently 13906550784 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections Note: I reduced shared_buffers to 128k and max conn to 9, but this does not change the number requested. I also changed kernel SHMALL, but no impact: sysctl -w kernel.shmall=5616156672

How to increase that max_locks_per_transaction ?

gumshoes commented 6 years ago

I have ~11k chunks and have max_locks_per_transaction = 30000. Question for the devs: If there are 10k 1 hour chunks in a hypertable and a query with a time predicate that includes only 4 of those chunks is run does that result in 4 locks or 10k locks?

thanks

sspieser commented 6 years ago

@gumshoes how did u get to 30000: how much server memory, any OS adjustements? Thx!

gumshoes commented 6 years ago

80GB RAM shared_buffers = 16GB effective_cache_size = 48GB

grep Huge /proc/meminfo
AnonHugePages:         0 kB
HugePages_Total:    8600
HugePages_Free:     8593
HugePages_Rsvd:       64
HugePages_Surp:        0
Hugepagesize:       2048 kB

sspieser commented 6 years ago

So we upgraded the server from 8GB to 80GB.

For Postgres:

shared_buffers = 16GB
effective_cache_size = 48GB
max_locks_per_transaction = 10000
huge_pages = on

(_unable to start with more than 10000 max_lock_pertransaction)

For Linux:

vm.nr_hugepages=35000
vm.overcommit_memory=2
kernel.shmall = 18446744073692774399
kernel.shmmax = 18446744073692774399

# grep Huge /proc/meminfo
AnonHugePages:         0 kB
HugePages_Total:   35000
HugePages_Free:    35000
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

But unfortunately, no change on my "simple" UPDATE of one row (among 140M in 29000 chunks !

test=# UPDATE collfact set iscurrent = '0'
test-# where time = timestamp '2018-04-11 00:00:00'
test-#

test-# and pointid = 10 and mesureid = 1 and version = 1;
ERROR:  out of memory
DÉTAIL : Failed on request of size 200.
Durée : 78503,267 ms (01:18,503)

Any help would be appreciated ;-)

gumshoes commented 6 years ago

To date I have only ever partitioned by timestamp and have never encountered such errors even with my 11k chunk 10TB DB. Have you tried partitioning only by timestamp? The docs do caution against unnecessary partitioning.

sspieser commented 6 years ago

@gumshoes remember I'm just evaluating Timescale, not trying to make an application with an overall of 140M rows of data; in my opinion, to address contention issues we may need to sub partition per location, as we may have millions of data per single day - up to 120M? (the first partition level)... I'm trying to check wether or not Timescale may be abble to manage partitions et sub partitions, a lot of, and not collapse while increasing partitions and/or data... So far I am not confident at all as the response times always increases when increasing the number of chunks (6k, 8k, 15k or 29k is not that much?)

We previously did the testing you suggest, with only a per day partitionning level; that was on Windows, we had always a response (no crash), but the response time was really bad, i.e., always increasing when adding days:

PG 10.3 Win64 + TimescaleDB 0.9             
                        365p.       1825p.      5840p.      9490p.

# of chunks                 365     1825        5840        9490    
insert / update 1 measure (sec.)        0,137185    1,394389    29,727372   97,508243
read 1 day for 1 point/type - 96 rows (sec.)    0,428493    0,571401    4,45753     7,591912
select current / previous single value (sec.)   0,617097    2,587037    9,57496     19,137179

I will try with my Linux RHEL7 server, to see if that work fine with the new parameters set today...

Again, any help or feedback is appreciated on my way of dealing with this subject :-)

gumshoes commented 6 years ago

Please provide a sample select query and the output of:

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
<your select query>

sspieser commented 6 years ago

@gumshoes the UPDATE fails also:

test=#  EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) UPDATE collfact set iscurrent = '0'
test-# where time = timestamp '2018-04-11 00:00:00'
test-#
test-# and pointid = 10 and mesureid = 1 and version = 1;
ERROR:  out of memory
DÉTAIL : Failed on request of size 200.
Durée : 66001,326 ms (01:06,001)

Some SELECT:

test=# EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) select AVG(value) from collfact where pointid = 123 and iscurrent = '1'
test-#   and time < timestamp '2018-11-11 00:00:00'
test-#   and time >= timestamp '2018-11-10 00:00:00' ;
                                                                                                                                                          QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=315.26..315.27 rows=1 width=8) (actual time=0.805..0.805 rows=1 loops=1)
   Output: avg(collfact.value)
   Buffers: shared hit=281
   ->  Append  (cost=0.00..315.25 rows=3 width=8) (actual time=0.222..0.733 rows=384 loops=1)
         Buffers: shared hit=281
         ->  Seq Scan on public.collfact  (cost=0.00..0.00 rows=1 width=8) (actual time=0.019..0.019 rows=0 loops=1)
               Output: collfact.value
               Filter: ((collfact."time" < '2018-11-11 00:00:00'::timestamp without time zone) AND (collfact."time" >= '2018-11-10 00:00:00'::timestamp without
time zone) AND (collfact.pointid = 123) AND (collfact.iscurrent = '1'::bpchar) AND (_timescaledb_internal.get_partition_hash(collfact.pointid) = 1942399285))
         ->  Bitmap Heap Scan on _timescaledb_internal._hyper_1_5016_chunk  (cost=13.09..315.25 rows=2 width=8) (actual time=0.202..0.670 rows=384 loops=1)
               Output: _hyper_1_5016_chunk.value
               Recheck Cond: ((_hyper_1_5016_chunk.pointid = 123) AND (_hyper_1_5016_chunk."time" < '2018-11-11 00:00:00'::timestamp without time zone) AND (_hy
per_1_5016_chunk."time" >= '2018-11-10 00:00:00'::timestamp without time zone))
               Filter: ((_hyper_1_5016_chunk.iscurrent = '1'::bpchar) AND (_timescaledb_internal.get_partition_hash(_hyper_1_5016_chunk.pointid) = 1942399285))
               Heap Blocks: exact=277
               Buffers: shared hit=281
               ->  Bitmap Index Scan on _hyper_1_5016_chunk_collfact_pointid_time_idx  (cost=0.00..13.09 rows=384 width=0) (actual time=0.137..0.137 rows=384 lo
ops=1)
                     Index Cond: ((_hyper_1_5016_chunk.pointid = 123) AND (_hyper_1_5016_chunk."time" < '2018-11-11 00:00:00'::timestamp without time zone) AND
(_hyper_1_5016_chunk."time" >= '2018-11-10 00:00:00'::timestamp without time zone))
                     Buffers: shared hit=4
 Planning time: 10044.872 ms
 Execution time: 2.127 ms
(19 lignes)

Durée : 10470,616 ms (00:10,471)

And another one: cf attached EXPLAIN2.txt

gumshoes commented 6 years ago

The devs have responded in slack about the locks. Right now if you have 10k chunks and even if your query is time predicated to include only 4 of those chunks 10k locks will be used. There is a fix for the above coming though. Conclusion on # chunks right now is to keep it low.

How much data will you be keeping on line? How many partitions would it take to do the above partitioning by timestamp only? How big will each partition be (broken down by data and indexes)? As for your questions I don't fully understand your desired end state but would recommend that you partition only by timestamp and figure out the resulting chunk size and then read the best practices section of https://docs.timescale.com/v0.9/using-timescaledb/hypertables.

For example one of my DBs grows by ~32GB day in 1 hour chunks partitioned by timestamp only.

sspieser commented 6 years ago

@gumshoes ok thanks for pointing me your question / answers...

My end state is not clearly defined right now as I am leveraging the use of these tool for a product; volumes will depend on customer... I have the opposite approach: how much can I put in my database, in order to make a proposal... not sure if that's much clear?

gumshoes commented 6 years ago

All good learning; you are almost there. Still think you are seriously handicapping yourself by partitioning by more than just timestamp and not figuring out and sizing your chunks as per the documented recommendations.

sspieser commented 6 years ago

@gumshoes ok thanks, I'll consider this option again :-)

mfreed commented 6 years ago

Thanks for all the great discussions. As @gumshoes pointed out, we have some optimizations coming down the pipe which change the way that Postgres performs constraint exclusion to significantly reduce the number of locks it needs (i.e., to only those on which the query will actually be run, as opposed to all chunks in order to lookup metadata for exclusion).

That said, I'm curious where you are coming from where 30,000 partitions is considered small. On native Postgres, for example, things really go south after a few hundred: https://blog.timescale.com/time-series-data-postgresql-10-vs-timescaledb-816ee808bac5#c4aa

sspieser commented 6 years ago

@mfreed sorry I'm not sure I understand your question

where you are coming from where 30,000 partitions is considered small

Sure we saw that Postgres partionning was not working well; we currently use Oracle, and have thousands of subpartitions (or tenth of thousand...): it works fine and is the only way to make our application work fine... that's why my first concern here is about a lot of chunks, and check if that sounds possible also on the open source world...

gumshoes commented 6 years ago

Just re-read this thread and IMO you may be fixating on # partitions instead of the broader question of if timescale can meet your requirements. Perhaps you don't need 10k+ partitions; migrating from one major SQL DB to another involves re-thinking schema/indexes/etc.

Better questions to ask and test:

rows inserted/interval
rows kept online
chunk size
% of chunk that is index
how many chunk indexes can fit into PostgreSQL RAM
can queries of older data be slower than those of newer data
can you buy 10x the hardware with the license savings from moving away from Oracle

sspieser commented 6 years ago

@gumshoes I'll definitely check your points against our "new goals"; anyway do you have an estimate date for the release of the optimization you mentionned above?

sspieser commented 6 years ago

@gumshoes found: the optimization is the issue #502 planned for the 0.10.0 release

sspieser commented 6 years ago

Hello all, As suggested we removed the subpartitions and just use the time partition dimension. 1 day == 115'188'480 rows... We planned to test 1 year of data, and then add almost empty partitions to test that the response time is the same even if we add partitions (that was not the case with subpartitions so that's our first concern).

So far, 13 days of data and the disk of 250 GB is full... so we have to add more space. Response time for that amount of data is fine, but we only have 1 year of data:

13 days with 115M rows per day (of chunk)
the rest of the year with only 384001rows

Stay tuned!

mfreed commented 6 years ago

Hi @sspieser, just following up: This optimization to better handle 10,000s of chunks was released in 0.10.0, which went out on June 27: https://github.com/timescale/timescaledb/releases/tag/0.10.0

Please let us know if you continue to have any problems, or I'll close out the issue. Thans!

sspieser commented 6 years ago

Hello!

Thanks for the notice, we will check later after the vacation season as I'm currently out for 3 weeks.

Thanks, Best regards, S

Le ven. 13 juil. 2018 15:08, Mike Freedman notifications@github.com a écrit :

Hi @sspieser https://github.com/sspieser, just following up: This optimization to better handle 10,000s of chunks was released in 0.10.0, which went out on June 27: https://github.com/timescale/timescaledb/releases/tag/0.10.0

Please let us know if you continue to have any problems, or I'll close out the issue. Thans!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/timescale/timescaledb/issues/515#issuecomment-404814489, or mute the thread https://github.com/notifications/unsubscribe-auth/AORS-9kZG3U9M5r-TNCbyXXV_8S5lY58ks5uGI3JgaJpZM4TvD7W .

gumshoes commented 6 years ago

Upgraded my DB (9.6.8) from 0.9.1 to 0.10.1 and our read queries are now ~10x faster and take ~80% less CPU. DB has 13k partitions containing 15TB of data and was suffering read wise. Great work guys, thanks.

timescale / timescaledb

Performance issues when using 10,000s of chunks #515