Open qvad opened 2 years ago
Jira Link: [DB-301]
SUT: AWS c5.xlarge, 3 nodes Slightly modified SqlDataLoad workload from sample-apps
"master_gflags": { "tablet_split_low_phase_size_threshold_bytes": "2097152", "tablet_split_limit_per_table": "8172", "enable_automatic_tablet_splitting": "false", "tablet_split_low_phase_shard_count_per_node": "134217728", "ysql_num_shards_per_tserver": "1", }, "tserver_gflags": { "ysql_num_shards_per_tserver": "1", "memstore_size_mb": "1", }
Scenario is focused on setting intensive tablet splitting flags and run simple workload. In this case we also do restart nodes in parallel.
On check logs stage one of the node may become unavailable due to OOM
Apr 26 18:36:46 localhost kernel: Out of memory: Kill process 6415 (postgres) score 45 or sacrifice child Apr 26 18:36:46 localhost kernel: Killed process 6415 (postgres) total-vm:690868kB, anon-rss:347480kB, file-rss:0kB, shmem-rss:120kB Apr 26 18:36:47 localhost kernel: postgres[6711]: segfault at 28 ip 00007fbcb11fa664 sp 00007ffec65ce230 error 4 Apr 26 18:36:47 localhost kernel: postgres[6734]: segfault at 28 ip 00007fbcb11fa664 sp 00007ffec65ce230 error 4 Apr 26 18:36:47 localhost kernel: in libpthread-2.23.so[7fbcb11f0000+17000] Apr 26 18:36:47 localhost kernel: Apr 26 18:36:47 localhost kernel: postgres[6635]: segfault at 28 ip 00007fbcb11fa664 sp 00007ffec65ce230 error 4 Apr 26 18:36:47 localhost kernel: in libpthread-2.23.so[7fbcb11f0000+17000]
Got same behaviour with disabled tablet splitting, changes description and fixed text.
Are segfaults a direct effect of the OOM killer, or do they indicate an additional bug (e.g. an incorrect memory access)?
Jira Link: [DB-301]
Description
SUT: AWS c5.xlarge, 3 nodes Slightly modified SqlDataLoad workload from sample-apps
Scenario is focused on setting intensive tablet splitting flags and run simple workload. In this case we also do restart nodes in parallel.
On check logs stage one of the node may become unavailable due to OOM