[DocDB] Fatal during backup restore: "Superblock flush marker ahead of apply marker" in operation_driver.cc

shishir2001-yb commented 3 days ago

Description

Version: 2024.2.0.0-b127 Logs: Added in Jira comments

Encountered the following Fatal during restore of a backup in a Pg partman test.

F20241118 12:07:18 ../../src/yb/tablet/operations/operation_driver.cc:423] T c09f49695daa43648f035f370934426a P fd339b15e1e7407aae1ee7587357f450 S RD-P Ts { physical: 1731928357481716 } kSnapshot (0x000015258a9ede00): Apply failed: Illegal state (yb/tablet/tablet_metadata.cc:1085): Superblock flush marker 1.13 ahead of apply marker 1.4, request: dest_uuid: "a4969bbad25a4f41b0c0b67d779c71c3" operation: RESTORE_ON_TABLET snapshot_id: "" tablet_id: "c09f49695daa43648f035f370934426a" propagated_hybrid_time: 7093978552244277248 snapshot_dir_override: "" snapshot_hybrid_time: 7093976107127709696 imported: 0 schedule_id: "50151394365C4010989A02A19F8F8CCF" restoration_id: "9EB85B5072EA4BBB88F7692BAD1CFD1E" schema_version: 0 schema { columns { id: 0 name: "created_at" type { main: INT64 } is_key: 1 is_hash_key: 1 is_nullable: 0 is_static: 0 is_counter: 0 sorting_type: 0 order: 1 pg_type_oid: 1184 marked_for_deletion: 0 } columns { id: 1 name: "ybidxbasectid" type { main: BINARY } is_key: 1 is_nullable: 0 is_static: 0 is_counter: 0 sorting_type: 1 order: -101 pg_type_oid: 0 marked_for_deletion: 0 } table_properties { contain_counters: 0 is_transactional: 1 consistency_level: STRONG use_mangled_column_name: 0 num_tablets: 2 is_ysql_catalog_table: 0 retain_delete_markers: 0 partitioning_version: 1 ysql_replica_identity: CHANGE } colocated_table_id { } pgschema_name: "public" } previous_snapshot_hybrid_time: 0 hide: 0 restoration_hybrid_time: 0 db_oid: 0 retention_duration_hours: 0
    @     0x55643187ecb7  google::LogMessage::SendToLog()
    @     0x55643187fbed  google::LogMessage::Flush()
    @     0x556431880239  google::LogMessageFatal::~LogMessageFatal()
    @     0x556432d86604  yb::tablet::OperationDriver::ReplicationFinished()
    @     0x556431d77bcb  yb::consensus::ConsensusRound::NotifyReplicationFinished()
    @     0x556431dc646f  yb::consensus::ReplicaState::ApplyPendingOperationsUnlocked()
    @     0x556431dc57d9  yb::consensus::ReplicaState::AdvanceCommittedOpIdUnlocked()
    @     0x556431dae8d4  yb::consensus::RaftConsensus::UpdateReplica()
    @     0x556431d8f203  yb::consensus::RaftConsensus::Update()
    @     0x55643312dbbe  yb::tserver::ConsensusServiceImpl::UpdateConsensus()
    @     0x556431e1df5e  std::__1::__function::__func<>::operator()()
    @     0x556431e1eb8f  yb::consensus::ConsensusServiceIf::Handle()
    @     0x556432cd4460  yb::rpc::ServicePoolImpl::Handle()
    @     0x556432befe4f  yb::rpc::InboundCall::InboundCallTask::Run()
    @     0x556432ce3c63  yb::rpc::(anonymous namespace)::Worker::Execute()
    @     0x55643358a143  yb::Thread::SuperviseThread()
    @     0x7fa2053ba1ca  start_thread
    @     0x7fa20560be73  __GI___clone

Also saw the following tserver error

E1118 12:07:39.984506 263073 ts_tablet_manager.cc:2086] T b7386e5049624e75aafc390011eeac8b P fd339b15e1e7407aae1ee7587357f450: Tablet failed to bootstrap: Illegal state (yb/tablet/tablet_bootstrap.cc:1684): Failed log replay. Reason: WAL files missing, or committed op id is incorrect. Expected both term and index of prev_op_id to be greater than or equal to the corresponding components of committed_op_id. prev_op_id=0.0, committed_op_id=1.14

Note: We also saw the coredump mentioned in following https://github.com/yugabyte/yugabyte-db/issues/24929

YBC logs indicates the restore failed due to-

I1118 12:15:06.418709 44157 rpc_endpoint.h:72] YSQL table not found: noncolocateddbnoncolocatedtable1_p7000: OBJECT_NOT_FOUND

In postgres logs we see Create table timed out, so it could be because of the master crash(Same as https://github.com/yugabyte/yugabyte-db/issues/24929

2024-11-18 12:00:42.247 UTC [86199] ERROR:  Timed out waiting kResponseSent, state: kRequestSent
2024-11-18 12:00:42.247 UTC [86199] STATEMENT:  CREATE TABLE public.noncolocateddbnoncolocatedtable1_p7000 PARTITION OF public.noncolocateddbnoncolocatedtable1
    FOR VALUES FROM (7000) TO (8000)
    PARTITION BY RANGE (age)
    SPLIT INTO 2 TABLETS;

Test details:


    1. Create a cluster with required g-flags
    2. Schema creation:
        a. Create 2 non-colocated DB
        b. Create 2 tables in non-colocated db
           (One is partitioned by time and the other by integer)
        c. Create Materialized views on all the tables (CREATE partition)
        d. Create indexes on all these tables (CREATE INDEX ON THESE)
    3. Create pg_partman extension with/without schema (TBD) and create pg_cron extension
    4. Create PITR on all the databases and note down time T0
    5. Try creating a partition using create_parent(), pass 'partman' as p_type, and
       verify it fails with below error:
       ERROR: partman is not a valid partitioning type for pg_partman
    6. Try creating partition set on a table which doesn't exist and verify it fails
       with the expected error
    7. Try creating a sub-partition set on parent table which doesn't have any partition
       set and verify it fails with:
       ERROR: Cannot subpartition a table that is not managed by pg_partman already.
              Given top parent table not found in public.part_config: public.t2
    8. Try creating a partition set on a column which is not part of the partition key and
       verify it fails with the expected error.
    9. TBA: Create a partition set on a table which already has partitioned tables
       (overlapping), verify it fails with:
    10. Drop extension and tables and re-create them
    11. Create partition (create_parent()) set for each table and validate if it's working
        by adding a few rows for each partition (Keep interval as low as possible).
    12. Create sub-partition (create_sub_parent()) for each partition set, validate old
        data is not present and validate new rows are getting added to both parent
        partition and sub-partition
    13. Schedule a pg_cron job which will insert rows continuously into all the tables.
    14. Schedule a pg_cron job which will run (run_maintenance()/run_maintenance_proc())
        every 5 minutes
    15. Start a thread to verify if (run_maintenance()/run_maintenance_proc()) is dropping
        the partition tables and creating new partition tables
    16. Start a thread to update and delete data
    17. Let this run for 20 minutes, this should verify the functionality
    18. Stop the DML ops threads and cron jobs.
    19. Note down the time (T1) and the data.
    20. Resume the DML ops threads and cron jobs.
    21. Sleep for 10 minutes
    22. Stop the DML ops threads and cron jobs.
    23. Restore to time T1 and validate the data
    24. Refresh all the Materialized views and note down the row count for each table
    25. Manually delete one of the existing partitions (Let's say P1)
    26. Run partition_gap_fill() and verify P1 is re-created and has data. Verify if
        data is deleted
    27. Refresh all the Materialized views and validate row count is the same as step 24.
    28. Alter a few tables and rename the column- Verify the column name is also renamed
        in partitioned tables (ALTER, Truncate, etc.)
    29. Resume the DML ops threads and cron jobs.
    30. Sleep for 10 minutes
    31. Stop the DML ops threads and cron jobs.
    32. Run drop_partition_time() and drop_partition_id() and verify the partitioned tables
        are detached from the parent table.
    33. Create a backup of some database (COPY FROM/TO COPY TO, ADD)
    34. Drop those databases
    35. Restore all the databases and verify schema and data is intact. ---------------------------------------->>>>>>>ISSUE OCCURRED HERE
    36. Resume the DML ops threads and cron jobs.
    37. Verify everything is working.
    38. Stop the DML ops threads and cron jobs.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

[X] I confirm this issue does not contain any sensitive information.

lingamsandeep commented 2 days ago

It is actually related to PITR + RBS + DeleteTable + a recently fixed bug, so multiple stars do need to align properly for this issue to happen :slightly_smiling_face:.
A PITR was performed on the database at : 1119 18:25:32
Subsequently the table was dropped - I am assuming as a result of drop database in order to perform a restore. The table was dropped at : I1119 18:27:00.931933
Unfortunately this drop of table/database ran into the RBS race in bug, triggering a RBS on a tablet that was anyway being dropped as part of drop table:
Once the RBS completed, there was only 1 WAL segment that had to be replayed which had the RESTORE_ON_TABLET op from the PITR operation earlier.
Apply of this RESTORE operation failed an assertion check where the opid obtained from RBS is ahead of the opid that RESTORE_ON_TABLET was attempting to write causing the tserver to crash.

lingamsandeep commented 2 days ago

The fix for this will be the same as that of https://github.com/yugabyte/yugabyte-db/issues/24574

rthallamko3 commented 23 hours ago

DUP of https://github.com/yugabyte/yugabyte-db/issues/24574

yugabyte / yugabyte-db

[DocDB] Fatal during backup restore: "Superblock flush marker ahead of apply marker" in operation_driver.cc #24974

Description

Issue Type

Warning: Please confirm that this issue does not contain any sensitive information