yandex / odyssey

Scalable PostgreSQL connection pooler
BSD 3-Clause "New" or "Revised" License
3.21k stars 162 forks source link

Degradation of transporting speed in TLS mode after select BLOBs. #483

Open skilyazhnev opened 1 year ago

skilyazhnev commented 1 year ago

Hi, we are faced with a problem that affects the speed of the query, to be precise query executed fast enough, but the odyssey doubled (or more) the time of transport through itself.

We use:
* Odyssey 1.3
* PostgreSQL 11
* 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux
* GCP hosts - Machine type: e2-medium (2 core 4GB) or e2-standard-4 (4 core and 16Gb) 

Our infrastructure looks like ( {} <- same host ): 

1) { | app | }<==> { | odyssey |  <==> | HAproxy | } <==> { | PostgreSQL | }

2) { | app | } <==> { | pgbouncer | <==> | HAproxy | <==>  | PostgreSQL | }

All connections use TLS. (HAproxy only forwarding port to right host)

At the same time, we have a table with BLOBs objects (only 6 rows but in sum, they size 650Mb in textfile).

When we selecting from this table "select * from blob_table;" we receiving these results by time: 

       
att num type ssl Time (output time , total in seconds)
1 direct connection to PostgreSQL protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256 28.08s user 3.64s system 45% cpu 1:09.43 total
2 Direct without TLS No 26.13s user 2.95s system 66% cpu 43.566 total
3 Odyssey without TLS No 25.16s user 3.26s system 53% cpu 52.682 total
4 Odyssey with TLS protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256 26.19s user 5.22s system 6% cpu 8:38.36 total
5 PGbouncer with TLS protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256 32.86s user 4.40s system 41% cpu 1:30.70 total

And while data go through odyssey's backend CPU usage goes to the top (100% per odyssey backend). Include a lot of Sys and User time. 

Screenshot 2023-03-03 at 3 18 18 PM


During data transfer, you can see this (strace -c -p ): 

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 54.18    6.167812         294     20949           munmap
 15.80    1.798835          50     35603      4221 write
 12.72    1.448449          19     74861           epoll_wait
 12.36    1.406748          16     84471        78 read
  3.11    0.354529          16     20945           mmap
  0.80    0.091604          59      1541           mprotect
  0.39    0.044001           6      6547           epoll_ctl
  0.38    0.043215          14      3082           close
  0.25    0.028948           6      4686           getpeername
------ ----------- ----------- --------- --------- ----------------
100.00   11.384141          45    252685      4299 total

(and I checked, calling of munmap is approximately evenly )

A lot of time process just looping this pattern:

epoll_wait(16, [{EPOLLIN, {u32=4161565704, u64=140243433714696}}, {EPOLLOUT, {u32=80600, u64=140243567196888}}], 5, 60000) = 2 <0.000012>
mmap(NULL, 51412992, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8ce2e00000 <0.000016>
write(31, "\27\3\3@\21!\237\223\17\270\367\310|\240\35J\316\376\344\245\305\252r\222a\261BE7\357\353\313"..., 16406) = 16406 <0.000059>
munmap(0x7f8ce2e00000, 51412992) = 0 <0.003422>
epoll_wait(16, [{EPOLLIN, {u32=4161565704, u64=140243433714696}}, {EPOLLOUT, {u32=80600, u64=140243567196888}}], 5, 60000) = 2 <0.000012>
mmap(NULL, 51396608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8ce2e04000 <0.000014>
write(31, "\27\3\3@\21\26-@\310\17\37\317\221\264%=\225\347\213\203\247\376\251\260q\253\26^\251*E\355"..., 16406) = 16406 <0.000069>
munmap(0x7f8ce2e04000, 51396608) = 0 <0.003577>
epoll_wait(16, [{EPOLLIN, {u32=4161565704, u64=140243433714696}}, {EPOLLOUT, {u32=80600, u64=140243567196888}}], 5, 60000) = 2 <0.000011>
mmap(NULL, 51380224, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8ce2e08000 <0.000014>
write(31, "\27\3\3@\219\342\261<\33\2273\370\324\312\264z\16\343\27VG\367\35\236H\231\344A\364\30m"..., 16406) = 16406 <0.000069>
munmap(0x7f8ce2e08000, 51380224) = 0 <0.003480>
epoll_wait(16, [{EPOLLIN, {u32=4161565704, u64=140243433714696}}, {EPOLLOUT, {u32=80600, u64=140243567196888}}], 5, 60000) = 2 <0.000012>
mmap(NULL, 51363840, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8ce2e0c000 <0.000014>
write(31, "\27\3\3@\21;\253\7)\364\237\205\374V\30?\307\335*\320]\203K\355\236KP>c\22-\372"..., 16406) = 16406 <0.000127>
munmap(0x7f8ce2e0c000, 51363840) = 0 <0.004923>
epoll_wait(16, [{EPOLLIN, {u32=4161565704, u64=140243433714696}}, {EPOLLOUT, {u32=80600, u64=140243567196888}}], 5, 60000) = 2 <0.000013>
mmap(NULL, 51347456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8ce2e10000 <0.000017>
write(31, "\27\3\3@\21s\377\fS\313\230\327\360v\2653DS\317\3574\263\30\3lj\334\3271\26\1\373"..., 16406) = 16406 <0.000053>
munmap(0x7f8ce2e10000, 51347456^Cstrace: Process 1491565 detached

solutions which don't speed up:
1) Turn off TLS on Odyssey <==> PostgreSQL connection
2) Turning off THP in linux
3) Increase cache option in Odyssey
4) Increase CPU cores and 

And what I can't config:

1) tls_protocol option in Odyssey because docs has no answer on which option I can use.

main question is: Which type of problem it can be?

We considered odyssey as a production solution, but so far this behavior is somewhat confusing

Odyssey config:

daemonize no

workers "auto"
resolvers 1

client_max 10000

graceful_die_on_errors yes
enable_online_restart no
bindwith_reuseport yes

unix_socket_dir "/run/odyssey/main"
unix_socket_mode "0644"

locks_dir "/tmp/odyssey-main"

log_file "/var/log/odyssey/main.log"
log_format "%n,main,%p,%l,%i,%s,%c,%h,%r,%u,%d,%M\n"
log_to_stdout no
log_syslog no
log_syslog_ident "odyssey"
log_syslog_facility "daemon"
log_debug no
log_config no
log_session yes
log_query no
log_stats no

readahead 8192
cache_coroutine 0
coroutine_stack_size 4

nodelay no
keepalive 1
keepalive_keep_interval 2
keepalive_probes 9
keepalive_usr_timeout 0

promhttp_server_port 16432
log_general_stats_prom no
log_route_stats_prom no

listen {
    host "*"
    port 6432
    backlog 128
    tls "allow"
    tls_ca_file "/etc/ssl/dbms/ca.pem"
    tls_cert_file "/etc/ssl/dbms/odyssey/odyssey.pem"
    tls_key_file "/etc/ssl/dbms/odyssey/odyssey-key.pem"
}
listen {
    port 6432
}

=================== 
database "dataname" {
        user "postgres" {
                authentication "clear_text"
                password "password"
                storage "dataname.dataname"
                storage_db "postgres"
                storage_user "postgres"
                storage_password "password"
                pool "session"
                pool_size 60
                pool_timeout 0
                pool_ttl 60
                pool_cancel yes
                pool_rollback yes
                pool_discard yes
                client_fwd_error yes
                log_debug no
                quantiles "0.99,0.95,0.5"
        }
}

odyssey_sysctl.txt

x4m commented 1 year ago

Thanks for detailed report! @reshke can you plz take a look too? As far as I know Odysey does not mmap\munmap on it's own... Looks very much like валенок на пульте. Somewhere within TLS interaction. Also we support for boring SSL, maybe they are better in that case?

skilyazhnev commented 1 year ago

@x4m Thx for the answer! About using SSL instead, this is part of my question too, I don't know how (docs and code didn't help me :( ) to force change parameter tls_protocol (for downgrade)

x4m commented 1 year ago

There's a setting tls_protocols but it seems to be not attached to real TLS creation See https://github.com/yandex/odyssey/blob/master/third_party/machinarium/sources/tls.c#L190

As far as I understand, when we are using TLS we call read in busy loop even when there is no data from network. Flamegraphs could prove or disprove this...

x4m commented 1 year ago

@skilyazhnev sorry for the long delay, now I'm, finally, on it. Did you use COPY or just SELECT?

skilyazhnev commented 1 year ago

@x4m only Select (actually we didn't try COPY)