neondatabase / neon

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
https://neon.tech
Apache License 2.0
15.28k stars 446 forks source link

Stack overflow in layer map #3711

Open knizhnik opened 1 year ago

knizhnik commented 1 year ago

Steps to reproduce

test_gc_dropped_relations.py:

import pytest
from fixtures.neon_fixtures import NeonEnvBuilder, PgBin

# Test gc_gc_dropped_relations
#
# This test sets fail point at the end of GC, and checks that pageserver
# normally restarts after it. Also, there should be GC ERRORs in the log,
# but the fixture checks the log for any unexpected ERRORs after every
# test anyway, so it doesn't need any special attention here.
@pytest.mark.timeout(600)
def test_gc_cutoff(neon_env_builder: NeonEnvBuilder, pg_bin: PgBin):
    env = neon_env_builder.init_start()

    # These warnings are expected, when the pageserver is restarted abruptly
    env.pageserver.allowed_errors.append(".*found future image layer.*")
    env.pageserver.allowed_errors.append(".*found future delta layer.*")

    pageserver_http = env.pageserver.http_client()

    # Use aggressive GC and checkpoint settings, so that we also exercise GC during the test
    tenant_id, _ = env.neon_cli.create_tenant(
        conf={
            "gc_period": "10 s",
            "gc_horizon": f"{1024 ** 2}",
            "checkpoint_distance": f"{1024 ** 2}",
            "compaction_period": "5 s",
            # set PITR interval to be small, so we can do GC
            "pitr_interval": "1 s",
            "compaction_threshold": "3",
            "image_creation_threshold": "2",
        }
    )
    pg = env.postgres.create_start("main", tenant_id=tenant_id)
    connstr = pg.connstr(options="-csynchronous_commit=off")
    pg_bin.run_capture(["pgbench", "-i", "-s10", connstr])

    pageserver_http.configure_failpoints(("after-timeline-gc-removed-layers", "exit"))

    for _ in range(5):
        with pytest.raises(Exception):
            pg_bin.run_capture(["pgbench", "-P1", "-N", "-c5", "-T500", "-Mprepared", connstr])
        env.pageserver.stop()
        env.pageserver.start()
        pageserver_http.configure_failpoints(("after-timeline-gc-removed-layers", "exit"))

Expected result

Increase pageserver storage size

Actual result

Crash at secon iteration with stack ovweflow:

#6  0x0000556b850cadb1 in std::sys::unix::stack_overflow::imp::signal_handler () at library/std/src/sys/unix/stack_overflow.rs:93
#7  <signal handler called>
#8  rpds::map::red_black_tree_map::RangeIterPtr<i128, core::option::Option<(u64, alloc::sync::Arc<dyn pageserver::tenant::storage_layer::PersistentLayer>)>, core::ops::range::RangeToInclusive<i128>, i128, archery::shared_pointer::kind::arc::ArcK>::init_if_needed<i128, core::option::Option<(u64, alloc::sync::Arc<dyn pageserver::tenant::storage_layer::PersistentLayer>)>, i128, core::ops::range::RangeToInclusive<i128>, archery::shared_pointer::kind::arc::ArcK> (self=0x7fc96d36f028, backwards=true)
    at /home/knizhnik/.cargo/registry/src/github.com-1ecc6299db9ec823/rpds-0.12.0/src/map/red_black_tree_map/mod.rs:1388
#9  rpds::map::red_black_tree_map::{impl#24}::next_back<i128, core::option::Option<(u64, alloc::sync::Arc<dyn pageserver::tenant::storage_layer::PersistentLayer>)>, core::ops::range::RangeToInclusive<i128>, i128, archery::shared_pointer::kind::arc::ArcK> (self=0x7fc96d36f028)
    at /home/knizhnik/.cargo/registry/src/github.com-1ecc6299db9ec823/rpds-0.12.0/src/map/red_black_tree_map/mod.rs:1466
#10 0x0000556b845c2d86 in core::iter::adapters::map::{impl#3}::next_back<(&i128, &core::option::Option<(u64, alloc::sync::Arc<dyn pageserver::tenant::storage_layer::PersistentLayer>)>), rpds::map::red_black_tree_map::RangeIterPtr<i128, core::option::Option<(u64, alloc::sync::Arc<dyn pageserver::tenant::storage_layer::PersistentLayer>)>, core::ops::range::RangeToInclusive<i128>, i128, archery::shared_pointer::kind::arc::ArcK>, fn(&archery::shared_pointer::SharedPointer<rpds::map::entry::Entry<i128, core::option::Option<(u64, alloc::sync::Arc<dyn pageserver::tenant::storage_layer::PersistentLayer>)>>, archery::shared_pointer::kind::arc::ArcK>) -> (&i128, &core::option::Option<(u64, alloc::sync::Arc<dyn pageserver::tenant::storage_layer::PersistentLayer>)>)> () at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/iter/adapters/map.rs:145
#11 core::iter::adapters::rev::{impl#1}::next<core::iter::adapters::map::Map<rpds::map::red_black_tree_map::RangeIterPtr<i128, core::option::Option<(u64, alloc::sync::Arc<dyn pageserver::tenant::storage_layer::PersistentLayer>)>, core::ops::range::RangeToInclusive<i128>, i128, archery::shared_pointer::kind::arc::ArcK>, fn(&archery::shared_pointer::SharedPointer<rpds::map::entry::Entry<i128, core::option::Option<(u64, alloc::sync::Arc<dyn pageserver::tenant::storage_layer::PersistentLayer>)>>, archery::shared_pointer::kind::arc::ArcK>) -> (&i128, &core::option::Option<(u64, alloc::sync::Arc<dyn pageserver::tenant::storage_layer::PersistentLayer>)>)>> ()
    at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/iter/adapters/rev.rs:33
#12 pageserver::tenant::layer_map::layer_coverage::LayerCoverage<alloc::sync::Arc<dyn pageserver::tenant::storage_layer::PersistentLayer>>::query<alloc::sync::Arc<dyn pageserver::tenant::storage_layer::PersistentLayer>> (self=<optimized out>, key=140502142846864)
    at pageserver/src/tenant/layer_map/layer_coverage.rs:105
#13 0x0000556b8451690e in pageserver::tenant::layer_map::LayerMap<dyn pageserver::tenant::storage_layer::PersistentLayer>::count_deltas<dyn pageserver::tenant::storage_layer::PersistentLayer> (self=0x7fc91c22f768, key=0x7fc96d36f3b0, lsn=0x7fc96d36f450, limit=...)
    at pageserver/src/tenant/layer_map.rs:530
#14 0x0000556b84517242 in pageserver::tenant::layer_map::LayerMap<dyn pageserver::tenant::storage_layer::PersistentLayer>::count_deltas<dyn pageserver::tenant::storage_layer::PersistentLayer> (self=0x7fc91c22f768, key=0x7fc96d36f620, lsn=0x7fc96d36f6c0, limit=...)
    at pageserver/src/tenant/layer_map.rs:565
...
#3288 0x0000556b84517242 in pageserver::tenant::layer_map::LayerMap<dyn pageserver::tenant::storage_layer::PersistentLayer>::count_deltas<dyn pageserver::tenant::storage_layer::PersistentLayer> (self=0x7fc91c22f768, key=0x7fc96d5622f0, lsn=0x7fc96d5621e8, limit=...)
    at pageserver/src/tenant/layer_map.rs:565
#3289 0x0000556b8458da6e in pageserver::tenant::timeline::Timeline::time_for_new_image_layer (self=<optimized out>, partition=<optimized out>, 
    lsn=...) at pageserver/src/tenant/timeline.rs:2677
#3290 0x0000556b8461f881 in pageserver::tenant::timeline::{impl#11}::create_image_layers::{async_fn#0} () at pageserver/src/tenant/timeline.rs:2708
#3291 core::future::from_generator::{impl#1}::poll<pageserver::tenant::timeline::{impl#11}::create_image_layers::{async_fn_env#0}> (self=..., 
    cx=0x7fc96d569168) at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/future/mod.rs:91
#3292 0x0000556b846ddbf7 in pageserver::tenant::timeline::{impl#7}::compact_inner::{async_fn#0} () at pageserver/src/tenant/timeline.rs:786
#3293 core::future::from_generator::{impl#1}::poll<pageserver::tenant::timeline::{impl#7}::compact_inner::{async_fn_env#0}> (self=..., 
    cx=0x7fc96d569168) at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/future/mod.rs:91
#3294 pageserver::tenant::timeline::{impl#7}::compact::{async_fn#0} () at pageserver/src/tenant/timeline.rs:654

Environment

Logs, links

knizhnik commented 1 year ago

I am not sure that it is a real problem, because this test is using small (1Mb) layer size, so producing large number of small layers. But I afraid that the same situation can happen with normal layers but huge database.

shanyp commented 1 year ago

@knizhnik could you please supply more information: how many layers the size of the DB debug or release build

knizhnik commented 1 year ago

It happens both with debug and release builds, number of layers at the moment of crash ~7000 Actually the crash can be easily reproduced with the attached test

jcsp commented 9 months ago

I don't see any changes in layer_coverage.rs that look like a fix for this, but the test doesn't fail -- the test code in this issue description is exactly like what is in test_gc_cutoff in main today (56171cbe8c2b81ba2b949a5ec39c11991fb5e47a), which doesn't hit a stack overflow.

I'm going to leave this ticket open: until we have a test that explicitly creates huge numbers of tiny layers and then checks compaction/GC still work properly, the issue might still be here.