Closed bmatican closed 5 years ago
I have not seen this in development since. Closing.
I am getting this error on 2.1.2.0-b10
. After a master restart a very similiar error picture pops up, when running on-top of EMC Isilon NFS.
I0427 16:17:06.513187 22 event_logger.cc:67] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [R]: EVENT_LOG_v1 {"time_micros": 1588004226513170, "job": 1, "event": "recovery_finished"}
I0427 16:17:06.535516 22 db_impl.cc:1154] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [R]: [JOB 2] Delete /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000//MANIFEST-000018 type=4 #18 -- OK
I0427 16:17:06.557935 22 db_impl.cc:1154] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [R]: [JOB 2] Delete /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000//000019.log type=0 #19 -- OK
I0427 16:17:06.618950 22 tablet.cc:620] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4: Opening intents DB at: /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.intents
I0427 16:17:06.643824 22 version_set.cc:2818] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: Recovered from manifest file:/mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.intents/MANIFEST-000008 succeeded,manifest_file_number is 8, next_file_number is 10, last_sequence is 1125899906842624, log_number is 0,prev_log_number is 0,max_column_family is 0, flushed_values is 0x0000000001f96cc0 -> { op_id: 1.4 hybrid_time: { physical: 1588003561276358 logical: 5 } history_cutoff: <invalid> hybrid_time_filter: }
I0427 16:17:06.643860 22 version_set.cc:2826] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: Column family [default] (ID 0), log number is 4
I0427 16:17:06.647893 22 event_logger.cc:67] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: EVENT_LOG_v1 {"time_micros": 1588004226647881, "job": 1, "event": "recovery_started", "log_files": [9]}
I0427 16:17:06.651748 22 db_impl.cc:1501] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: Recovering log #9 mode 0 skip-recovery 0
I0427 16:17:06.651837 22 version_set.cc:2259] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: Creating manifest 11
I0427 16:17:06.659281 22 version_set.cc:3325] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: Writing version edit: comparator: "leveldb.BytewiseComparator"
I0427 16:17:06.659440 22 version_set.cc:3325] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: Writing version edit: log_number: 4
flushed_frontier {
[type.googleapis.com/yb.docdb.ConsensusFrontierPB] {
op_id {
term: 1
index: 4
}
hybrid_time: 6504462586987962373
history_cutoff: 18446744073709551614
}
}
I0427 16:17:06.743016 22 event_logger.cc:67] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: EVENT_LOG_v1 {"time_micros": 1588004226742998, "job": 1, "event": "recovery_finished"}
I0427 16:17:06.766126 22 db_impl.cc:1154] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: [JOB 2] Delete /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.intents//MANIFEST-000008 type=4 #8 -- OK
I0427 16:17:06.788118 22 db_impl.cc:1154] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: [JOB 2] Delete /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.intents//000009.log type=0 #9 -- OK
I0427 16:17:06.897486 22 tablet.cc:664] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4: Successfully opened a RocksDB database at /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000, obj: 0x2060000
I0427 16:17:06.897553 22 tablet_bootstrap.cc:431] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4: Time spent opening tablet: real 0.505s user 0.002s sys 0.012s
I0427 16:17:06.897809 26 transaction_participant.cc:945] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4: LoadTransactions start
I0427 16:17:06.897876 26 transaction_participant.cc:984] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4: LoadTransactions done: loaded 0 transactions
I0427 16:17:06.905642 22 tablet_bootstrap.cc:497] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4: Will attempt to recover log segment /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001
I0427 16:17:06.905675 22 tablet_bootstrap.cc:497] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4: Will attempt to recover log segment /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000002
I0427 16:17:06.905824 22 tablet_bootstrap.cc:833] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4: Max persistent index in RocksDB's SSTables before bootstrap: term: 1 index: 4/term: 1 index: 4
I0427 16:17:06.909987 22 log.cc:808] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4: Setting log wal retention time to 0 seconds
W0427 16:17:06.915786 22 log_util.cc:211] Could not read footer for segment: /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001: Not found (yb/consensus/log_util.cc:446): Footer not found. Footer magic doesn't match
W0427 16:17:06.915815 22 log_reader.cc:182] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4: Log segment /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
I0427 16:17:06.921162 22 log_util.cc:657] Scanning /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001 for valid entry headers following offset 268191...
I0427 16:17:06.921231 22 log_util.cc:703] Found no log entry headers
I0427 16:17:06.921250 22 log_util.cc:526] Ignoring log segment corruption in /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001 because there are no log entries following the corrupted one. The server probably crashed in the middle of writing an entry to the write-ahead log or downloaded an active log via remote bootstrap. Error detail: Corruption (yb/consensus/log_util.cc:776): Invalid checksum in log entry head header: found=0, computed=2351477386: Log file corruption detected. Failed trying to read batch #1151 at offset 268191 for log segment /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001: Prior batch offsets: 267248 267276 268163 268191; Last log entries read: [REPLICATE (1.572)] [REPLICATE (1.573)] [REPLICATE (1.574)] [REPLICATE (1.575)]
I0427 16:17:06.921277 22 log_util.cc:263] Successfully rebuilt footer for segment: /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001 (valid entries through byte offset 268191)
W0427 16:17:06.925917 22 log.cc:1179] Error closing log: Illegal state (yb/consensus/log.cc:1012): Bad state for Close() 0
I0427 16:17:06.926023 22 transaction_participant.cc:141] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4: Shutdown
I0427 16:17:06.926138 22 db_impl.cc:641] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: Shutting down RocksDB at: /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.intents
I0427 16:17:06.946408 22 db_impl.cc:751] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: Shutdown done
I0427 16:17:06.946509 22 db_impl.cc:641] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [R]: Shutting down RocksDB at: /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000
I0427 16:17:06.967371 22 db_impl.cc:751] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [R]: Shutdown done
E0427 16:17:06.967690 22 master.cc:241] Master@10.233.94.165:7100: Unable to init master catalog manager: IO error (yb/util/env_util.cc:156): Unable to initialize catalog manager: Failed to initialize sys tables async: Failed log replay. Reason: Failed to open new log: Unable to open readable log segment: /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000002: Unable to initialize segment: EOF trying to read 12 bytes at offset 0
E0427 16:17:06.967831 1 main_util.cc:22] IO error (yb/util/env_util.cc:156): Unable to initialize catalog manager: Failed to initialize sys tables async: Failed log replay. Reason: Failed to open new log: Unable to open readable log segment: /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000002: Unable to initialize segment: EOF trying to read 12 bytes at offset 0
I0427 16:17:06.967880 1 master.cc:284] Master@10.233.94.165:7100 shutting down...
E0427 16:17:06.970228 15 async_initializer.cc:83] Failed to initialize client: Aborted (yb/rpc/reactor.cc:85): Could not locate the leader master: Shutdown connection (system error 108)
I0427 16:17:06.980762 1 tablet_peer.cc:380] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [state=BOOTSTRAPPING]: Initiating TabletPeer shutdown
I0427 16:17:06.980800 1 tablet_peer.cc:397] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [state=QUIESCING]: Started shutdown from state: BOOTSTRAPPING
I0427 16:17:06.981094 1 master.cc:292] Master@10.233.94.165:7100 shutdown complete.
@Arttii Thanks for your interest in Yugabyte.
It seems we still have some kinks with running on NFS. See https://github.com/yugabyte/yugabyte-db/issues/3871
Can you confirm you're seeing something like this in your logs:
W0415 20:11:41.046124 19876 env_posix.cc:384] The filesystem does not support fallocate().
HI, somehow no. This is the entire log I have https://gist.github.com/Arttii/6c5d47aa7c569a6354b57a617d7f4071. I also tried to set durable_wal_write
to true
, but it did not really help. The file its referencing wal-000000002
is there, just empty.
EDIT: Actually as we are running his on k8s, the quick container restart was hiding another error picture:
I0428 11:40:34.975433 1 master_main.cc:93] NumCPUs determined to be: 2
I0428 11:40:34.977380 1 mem_tracker.cc:249] MemTracker: hard memory limit is 3.273251 GB
I0428 11:40:34.977401 1 mem_tracker.cc:251] MemTracker: soft memory limit is 2.782263 GB
I0428 11:40:34.979595 1 rpc_server.cc:86] yb::server::RpcServer created at 0x28ce4d0
I0428 11:40:34.979629 1 master.cc:143] yb::master::Master created at 0x7ffc126e69e0
I0428 11:40:34.979651 1 master.cc:144] yb::master::TSManager created at 0x28e1340
I0428 11:40:34.979655 1 master.cc:145] yb::master::CatalogManager created at 0x2c2c000
I0428 11:40:34.983089 1 master_main.cc:114] Initializing master server...
I0428 11:40:34.992725 1 fs_manager.cc:249] Opened local filesystem: /mnt/data0
uuid: "2920707d7f38428d969f1ff0ec35a922"
format_stamp: "Formatted at 2020-04-28 11:34:10 on yb-master-2"
I0428 11:40:34.993038 1 server_base.cc:231] Auto setting FLAGS_num_reactor_threads to 2
I0428 11:40:34.998124 1 master_main.cc:117] Starting Master server...
I0428 11:40:34.998668 1 service_pool.cc:148] yb.master.MasterBackupService: yb::rpc::ServicePoolImpl created at 0x28d2480
I0428 11:40:34.998800 16 async_initializer.cc:73] Starting to init ybclient
I0428 11:40:35.000600 1 service_pool.cc:148] yb.master.MasterService: yb::rpc::ServicePoolImpl created at 0x28d2b40
I0428 11:40:35.000885 1 service_pool.cc:148] yb.tserver.TabletServerService: yb::rpc::ServicePoolImpl created at 0x28d3680
I0428 11:40:35.001070 1 service_pool.cc:148] yb.consensus.ConsensusService: yb::rpc::ServicePoolImpl created at 0x28d3b00
I0428 11:40:35.004850 1 service_pool.cc:148] yb.tserver.RemoteBootstrapService: yb::rpc::ServicePoolImpl created at 0x28d3d40
I0428 11:40:35.004969 1 webserver.cc:148] Starting webserver on 0.0.0.0:7000
I0428 11:40:35.004978 1 webserver.cc:153] Document root: /home/yugabyte/www
I0428 11:40:35.005414 1 webserver.cc:240] Webserver started. Bound to: http://0.0.0.0:7000/
I0428 11:40:35.005573 1 service_pool.cc:148] yb.server.GenericService: yb::rpc::ServicePoolImpl created at 0x2ef0000
I0428 11:40:35.006211 1 rpc_server.cc:169] RPC server started. Bound to: 10.233.99.226:7100
I0428 11:40:35.006247 1 server_base.cc:487] Using private ip address yb-master-2.yb-masters.dap-base.svc.cluster.local
I0428 11:40:35.006856 22 sys_catalog.cc:194] Trying to load previous SysCatalogTable data from disk
I0428 11:40:35.043766 22 consensus_meta.cc:275] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Updating active role from UNKNOWN_ROLE to FOLLOWER. Consensus state: current_term: 1 leader_uuid: "" config { opid_index: -1 peers { permanent_uuid: "db2e76eb4dc7415c8e32a41389b5e8af" member_type: VOTER last_known_private_addr { host: "yb-master-0.yb-masters.dap-base.svc.cluster.local" port: 7100 } last_known_broadcast_addr { host: "yb-master-0.yb-masters.dap-base.svc.cluster.local" port: 7100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } peers { permanent_uuid: "3b99165b0e0b42a68002a7a34db19661" member_type: VOTER last_known_private_addr { host: "yb-master-1.yb-masters.dap-base.svc.cluster.local" port: 7100 } last_known_broadcast_addr { host: "yb-master-1.yb-masters.dap-base.svc.cluster.local" port: 7100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } peers { permanent_uuid: "2920707d7f38428d969f1ff0ec35a922" member_type: VOTER last_known_private_addr { host: "yb-master-2.yb-masters.dap-base.svc.cluster.local" port: 7100 } last_known_broadcast_addr { host: "yb-master-2.yb-masters.dap-base.svc.cluster.local" port: 7100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } }, has_pending_config = 0
I0428 11:40:35.043872 22 sys_catalog.cc:239] Configuring consensus for distributed operation...
I0428 11:40:35.043903 22 server_base.cc:487] Using private ip address yb-master-2.yb-masters.dap-base.svc.cluster.local
I0428 11:40:35.044342 22 tablet_peer.cc:1156] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [state=BOOTSTRAPPING]: Changed state from NOT_STARTED to BOOTSTRAPPING
I0428 11:40:35.046813 22 consensus_meta.cc:275] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Updating active role from UNKNOWN_ROLE to FOLLOWER. Consensus state: current_term: 1 leader_uuid: "" config { opid_index: -1 peers { permanent_uuid: "db2e76eb4dc7415c8e32a41389b5e8af" member_type: VOTER last_known_private_addr { host: "yb-master-0.yb-masters.dap-base.svc.cluster.local" port: 7100 } last_known_broadcast_addr { host: "yb-master-0.yb-masters.dap-base.svc.cluster.local" port: 7100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } peers { permanent_uuid: "3b99165b0e0b42a68002a7a34db19661" member_type: VOTER last_known_private_addr { host: "yb-master-1.yb-masters.dap-base.svc.cluster.local" port: 7100 } last_known_broadcast_addr { host: "yb-master-1.yb-masters.dap-base.svc.cluster.local" port: 7100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } peers { permanent_uuid: "2920707d7f38428d969f1ff0ec35a922" member_type: VOTER last_known_private_addr { host: "yb-master-2.yb-masters.dap-base.svc.cluster.local" port: 7100 } last_known_broadcast_addr { host: "yb-master-2.yb-masters.dap-base.svc.cluster.local" port: 7100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } }, has_pending_config = 0
I0428 11:40:35.046852 22 tablet_bootstrap_if.cc:75] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Bootstrap starting.
I0428 11:40:35.062943 22 tablet.cc:376] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Schema version for sys.catalog is 0
I0428 11:40:35.063233 22 transaction_participant.cc:119] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Create
I0428 11:40:35.063627 22 docdb_rocksdb_util.cc:272] Auto setting FLAGS_rocksdb_max_background_flushes to 1
I0428 11:40:35.063661 22 docdb_rocksdb_util.cc:295] Auto setting FLAGS_rocksdb_max_background_compactions to 1
I0428 11:40:35.063666 22 docdb_rocksdb_util.cc:306] Auto setting FLAGS_rocksdb_base_background_compactions to 1
I0428 11:40:35.063670 22 docdb_rocksdb_util.cc:329] Auto setting FLAGS_priority_thread_pool_size to 1
I0428 11:40:35.063899 22 tablet.cc:479] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Creating RocksDB database in dir /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000
I0428 11:40:35.076493 22 tablet.cc:605] Opening RocksDB at: /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000
I0428 11:40:35.114245 22 version_set.cc:2818] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [R]: Recovered from manifest file:/mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/MANIFEST-000016 succeeded,manifest_file_number is 16, next_file_number is 18, last_sequence is 1125899908347196, log_number is 0,prev_log_number is 0,max_column_family is 0, flushed_values is 0x00000000030e3080 -> { op_id: 1.4 hybrid_time: { physical: 1588073669996455 } history_cutoff: <invalid> hybrid_time_filter: <invalid> }
I0428 11:40:35.114280 22 version_set.cc:2826] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [R]: Column family [default] (ID 0), log number is 3
I0428 11:40:35.119555 22 event_logger.cc:67] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [R]: EVENT_LOG_v1 {"time_micros": 1588074035119521, "job": 1, "event": "recovery_started", "log_files": [13]}
I0428 11:40:35.122673 22 db_impl.cc:1501] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [R]: Recovering log #13 mode 0 skip-recovery 0
I0428 11:40:35.122741 22 version_set.cc:2259] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [R]: Creating manifest 18
I0428 11:40:35.130498 22 version_set.cc:3325] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [R]: Writing version edit: comparator: "leveldb.BytewiseComparator"
I0428 11:40:35.130973 22 version_set.cc:3325] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [R]: Writing version edit: log_number: 3
new_files {
level: 0
number: 10
total_file_size: 23914435
base_file_size: 110376
smallest {
key: "H\200\000\000\001S000000010000300080000000000000af\000\000!J\200#\200\001\263\303\311x\026\312\200J\001\250X\002\000\000\000\004"
seqno: 1125899906842625
user_values {
tag: 1
data: "\200\001\263\303\311\324;C\200J"
}
user_values {
tag: 10
data: "H\200\000\000\001"
}
user_values {
tag: 11
data: "$"
}
user_values {
tag: 12
data: "$"
}
user_values {
tag: 13
data: "$"
}
user_values {
tag: 14
data: "$"
}
}
largest {
key: "y\016\340@\000\000\000\000\200\0000\000\000\001\000\000\000O\000\0000\252!K\215#\200\001\263\303\306\326\200\307\200>+\0018\365\026\000\000\000\004"
seqno: 1125899908347196
user_values {
tag: 1
data: "\200\001\263\303\306\326a\315\200?\253"
}
user_values {
tag: 10
data: "S\377\350\240Yf\327N\336\264\244\352,\016Jj\332\000\000"
}
user_values {
tag: 11
data: "Sysql-catalog-configuration\000\000"
}
user_values {
tag: 12
data: "Syes_or_no_check\000\000"
}
user_values {
tag: 13
data: "SS\377\350\240Yf\327N\336\264\244\352,\016Jj\332\000\001\000\001!\000\000"
}
user_values {
tag: 14
data: "SS\323\006E\217\024\177A\306\256\212\244\201\002\261\377\325\000\001\000\001!\000\000"
}
}
}
flushed_frontier {
[type.googleapis.com/yb.docdb.ConsensusFrontierPB] {
op_id {
term: 1
index: 4
}
hybrid_time: 6504749752305479680
history_cutoff: 18446744073709551614
}
}
I0428 11:40:35.227035 22 event_logger.cc:67] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [R]: EVENT_LOG_v1 {"time_micros": 1588074035227022, "job": 1, "event": "recovery_finished"}
I0428 11:40:35.240010 22 db_impl.cc:1154] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [R]: [JOB 2] Delete /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000//MANIFEST-000016 type=4 #16 -- OK
I0428 11:40:35.256011 22 db_impl.cc:1154] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [R]: [JOB 2] Delete /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000//MANIFEST-000011 type=4 #11 -- OK
I0428 11:40:35.270684 22 db_impl.cc:1154] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [R]: [JOB 2] Delete /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000//000013.log type=0 #13 -- OK
I0428 11:40:35.348821 22 tablet.cc:620] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Opening intents DB at: /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.intents
I0428 11:40:35.376628 22 version_set.cc:2818] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [I]: Recovered from manifest file:/mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.intents/MANIFEST-000006 succeeded,manifest_file_number is 6, next_file_number is 8, last_sequence is 1125899906842624, log_number is 0,prev_log_number is 0,max_column_family is 0, flushed_values is 0x00000000030c8b10 -> { op_id: 1.4 hybrid_time: { physical: 1588073669996455 } history_cutoff: <invalid> hybrid_time_filter: <invalid> }
I0428 11:40:35.376659 22 version_set.cc:2826] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [I]: Column family [default] (ID 0), log number is 0
I0428 11:40:35.379743 22 event_logger.cc:67] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [I]: EVENT_LOG_v1 {"time_micros": 1588074035379733, "job": 1, "event": "recovery_started", "log_files": [3]}
I0428 11:40:35.382517 22 db_impl.cc:1501] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [I]: Recovering log #3 mode 0 skip-recovery 0
I0428 11:40:35.382645 22 version_set.cc:2259] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [I]: Creating manifest 8
I0428 11:40:35.389412 22 version_set.cc:3325] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [I]: Writing version edit: comparator: "leveldb.BytewiseComparator"
I0428 11:40:35.389641 22 version_set.cc:3325] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [I]: Writing version edit: log_number: 0
flushed_frontier {
[type.googleapis.com/yb.docdb.ConsensusFrontierPB] {
op_id {
term: 1
index: 4
}
hybrid_time: 6504749752305479680
history_cutoff: 18446744073709551614
}
}
I0428 11:40:35.413612 22 event_logger.cc:67] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [I]: EVENT_LOG_v1 {"time_micros": 1588074035413603, "job": 1, "event": "recovery_finished"}
I0428 11:40:35.427080 22 db_impl.cc:1154] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [I]: [JOB 2] Delete /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.intents//MANIFEST-000006 type=4 #6 -- OK
I0428 11:40:35.451843 22 db_impl.cc:1154] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [I]: [JOB 2] Delete /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.intents//MANIFEST-000001 type=4 #1 -- OK
I0428 11:40:35.468727 22 db_impl.cc:1154] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [I]: [JOB 2] Delete /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.intents//000003.log type=0 #3 -- OK
I0428 11:40:35.580117 22 tablet.cc:664] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Successfully opened a RocksDB database at /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000, obj: 0x2987e00
I0428 11:40:35.580503 22 tablet_bootstrap.cc:431] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Time spent opening tablet: real 0.517s user 0.006s sys 0.006s
I0428 11:40:35.580380 25 transaction_participant.cc:945] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: LoadTransactions start
I0428 11:40:35.580926 25 transaction_participant.cc:984] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: LoadTransactions done: loaded 0 transactions
I0428 11:40:35.595329 22 tablet_bootstrap.cc:497] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Will attempt to recover log segment /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001
I0428 11:40:35.595429 22 tablet_bootstrap.cc:833] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Max persistent index in RocksDB's SSTables before bootstrap: term: 1 index: 4/term: 1 index: 4
I0428 11:40:35.598526 22 log.cc:808] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Setting log wal retention time to 0 seconds
W0428 11:40:35.608687 22 log_util.cc:211] Could not read footer for segment: /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001: Not found (yb/consensus/log_util.cc:446): Footer not found. Footer magic doesn't match
W0428 11:40:35.608716 22 log_reader.cc:182] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Log segment /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
I0428 11:40:35.617741 22 log_util.cc:657] Scanning /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001 for valid entry headers following offset 260300...
I0428 11:40:35.617780 22 log_util.cc:703] Found no log entry headers
I0428 11:40:35.617797 22 log_util.cc:526] Ignoring log segment corruption in /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001 because there are no log entries following the corrupted one. The server probably crashed in the middle of writing an entry to the write-ahead log or downloaded an active log via remote bootstrap. Error detail: Corruption (yb/consensus/log_util.cc:776): Invalid checksum in log entry head header: found=0, computed=2351477386: Log file corruption detected. Failed trying to read batch #898 at offset 260300 for log segment /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001: Prior batch offsets: 259359 259387 260272 260300; Last log entries read: [REPLICATE (1.571)] [REPLICATE (1.572)] [REPLICATE (1.573)] [REPLICATE (1.574)]
I0428 11:40:35.617812 22 log_util.cc:263] Successfully rebuilt footer for segment: /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001 (valid entries through byte offset 260300)
I0428 11:40:35.618870 22 log.cc:412] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Opened existing logs. Last segment is /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001
I0428 11:40:35.618898 22 log.cc:416] durable_wal_write is turned on.
W0428 11:40:35.627832 26 env_posix.cc:317] The filesystem does not support fallocate().
I0428 11:40:35.661057 22 tablet_bootstrap.cc:857] Bootstrap optimizer: op_id_replay_lowest={ term: 1 index: 4 }
I0428 11:40:35.661116 22 log_util.cc:657] Scanning /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000002 for valid entry headers following offset 162...
*** Aborted at 1588074036 (unix time) try "date -d @1588074036" if you are using GNU date ***
PC: @ 0x7f0af67c9c0b yb::log::ReadableLogSegment::DecodeEntryHeader()
*** SIGSEGV (@0x36b2000) received by PID 1 (TID 0x7f0ae04ae700) from PID 57352192; stack trace: ***
@ 0x7f0aea847ba0 (unknown)
@ 0x7f0af67c9c0a yb::log::ReadableLogSegment::DecodeEntryHeader()
@ 0x7f0af67ca097 yb::log::ReadableLogSegment::ScanForValidEntryHeaders()
@ 0x7f0af67d198c yb::log::ReadableLogSegment::ReadFirstEntryMetadata()
@ 0x7f0af810a54e yb::tablet::TabletBootstrap::PlaySegments()
@ 0x7f0af810c25f yb::tablet::TabletBootstrap::Bootstrap()
@ 0x7f0af8111e33 yb::tablet::BootstrapTablet()
@ 0x7f0af8e6f12e yb::master::SysCatalogTable::OpenTablet()
@ 0x7f0af8e7056f yb::master::SysCatalogTable::SetupTablet()
@ 0x7f0af8e70d79 yb::master::SysCatalogTable::Load()
@ 0x7f0af8d9992d yb::master::CatalogManager::InitSysCatalogAsync()
@ 0x7f0af8da6238 yb::master::CatalogManager::Init()
@ 0x7f0af8e3e8a9 yb::master::Master::InitCatalogManager()
@ 0x7f0af8e3e9b5 yb::master::Master::InitCatalogManagerTask()
@ 0x7f0aef0fb8a3 yb::ThreadPool::DispatchThread()
@ 0x7f0aef0f7fae yb::Thread::SuperviseThread()
@ 0x7f0aea83f694 start_thread
@ 0x7f0ae9f7c41d __clone
@ 0x0 (unknown)
If I delete the file this error repeats itself. Actually it looks very similar to https://github.com/yugabyte/yugabyte-db/issues/4157
@bmatican is there a way to turn on more detailed logging on something? To get a fuller picture?
W0428 11:40:35.627832 26 env_posix.cc:317] The filesystem does not support fallocate().
That is present in your logs @Arttii, which is basically #3871 that I liked to above.
We actually use fallocate
and expect it to have worked (as per this crash here), so we should really convert that warning into an upfront check and crash asap when you're running on filesystems that do not give us the pre-requisites we need.
Would it be possible for you to test Yugabyte on an xfs filesystem?
You are right, I had somehow missed the log. We are running over NFS on EMC Isilion, it seems the allocate API is not supported on it. I guess there is no way to get around it.
Attaching the master log:
This happened during a yb-ctl create/stop/start test on my dev environment, with files on NFS.
cc @mbautin @hectorgcr