yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.95k stars 1.07k forks source link

[DocDB] Fatal: Network error (yb/util/net/socket.cc:284): listen() error: Address already in use (system error 98) #21777

Closed shishir2001-yb closed 4 months ago

shishir2001-yb commented 6 months ago

Jira Link: DB-10652

Description

Version: 2.23.0.0-b91 Logs: https://drive.google.com/file/d/1OIdxa37M0ICUw3_kVgWF95mySVRWvbRl/view?usp=sharing Encountered the following Fatal while running cross DB DDLs test with PITR and Backup/Restore.

F20240401 18:25:45 ../../src/yb/tserver/tablet_server_main_impl.cc:270] Network error (yb/util/net/socket.cc:284): listen() error: Address already in use (system error 98)
    @     0xaaaae398aafc  google::LogMessage::SendToLog()
    @     0xaaaae398b9a0  google::LogMessage::Flush()
    @     0xaaaae398c03c  google::LogMessageFatal::~LogMessageFatal()
    @     0xaaaae5102084  yb::tserver::TabletServerMain()
    @     0xaaaae393bf90  main
    @     0xffffb7d44384  __libc_start_main
    @     0xaaaae385e034  (unknown)

Test details:

Test Description:
        1. Create a cluster with required g-flags
        2. Start the cross DB DDL workload which will execute DDLs and DMLs across databases concurrently (50 colocated
           database and 100 non-colocated database), run this for 20-30 mins
        3. Create a PITR schedule on 10 random database
        4. Start a while loop and run it for 120 mins
          a. Note down time fr PITR(0) 
          b. Create a backup of 1 random database
          c. Start the cross DB DDL workload and stop it after 10 mins
          d. Note down the time for PITR(1)
          e. Start the cross DB DDL workload and run it for 10 mins
          f. Execute PITR on all 10 databases at random times(Between 1-9 sec ago).
          g. Restore to PITR(1)
          h. Validate data
          i. Restore to PITR(0) with a probability of 0.6 and validate data
          j. Delete the PITR schedule for the backup db 
          k. Drop the database 
          l. Restore the backup
          m. Create the snapshot schedule for this new DB

G-flags:

tserver_gflags={
                "ysql_enable_packed_row": "true",
                "ysql_enable_packed_row_for_colocated_table": "true",
                "enable_automatic_tablet_splitting": "true",
                "ysql_max_connections": "500",
                'client_read_write_timeout_ms': str(30 * 60 * 1000),
                'yb_client_admin_operation_timeout_sec': str(30 * 60),
                "consistent_restore": "true",
                "ysql_enable_db_catalog_version_mode": "true",
                "allowed_preview_flags_csv": "ysql_enable_db_catalog_version_mode",
                "tablet_replicas_per_gib_limit": 0,
                "ysql_pg_conf_csv": "yb_debug_report_error_stacktrace=true"
            },
            master_gflags={
                "ysql_enable_packed_row": "true",
                "ysql_enable_packed_row_for_colocated_table": "true",
                "enable_automatic_tablet_splitting": "true",
                "tablet_split_high_phase_shard_count_per_node": 20000,
                "tablet_split_high_phase_size_threshold_bytes": 2097152,  # 2MB
                # low_phase_size 100KB
                "tablet_split_low_phase_size_threshold_bytes": 102400,  # 100 KB
                "tablet_split_low_phase_shard_count_per_node": 10000,
                "consistent_restore": "true",
                "ysql_enable_db_catalog_version_mode": "true",
                "allowed_preview_flags_csv": "ysql_enable_db_catalog_version_mode",
                "tablet_replicas_per_gib_limit": 0,
                "ysql_pg_conf_csv": "yb_debug_report_error_stacktrace=true"
            }

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

rthallamko3 commented 6 months ago

Looks like a DUP of #16654. cc @shishir2001-yb , @shamanthchandra-yb

rthallamko3 commented 6 months ago

I am inclined to close this.