yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
9.06k stars 1.09k forks source link

Master using O_DIRECT on NFS causes various types of checks/fatals #1211

Closed bmatican closed 5 years ago

bmatican commented 5 years ago

Attaching the master log:

Log file created at: 2019/04/15 20:11:40
Running on machine: bogdan-dev-server
Application fingerprint: version 1.2.5.0 build PRE_RELEASE revision fc171be80981b2d13b74b97f887b1380f2123a74 build_type RELEASE built at 09 Apr 2019 17:00:21 UTC
Running duration (h:mm:ss): 0:00:00
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0415 20:11:40.895920 19798 master_main.cc:93] NumCPUs determined to be: 4
I0415 20:11:40.904657 19798 mem_tracker.cc:241] MemTracker: hard memory limit is 1.468634 GB
I0415 20:11:40.904678 19798 mem_tracker.cc:243] MemTracker: soft memory limit is 1.248339 GB
I0415 20:11:40.906541 19798 master_main.cc:108] Initializing master server...
I0415 20:11:40.908293 19798 fs_manager.cc:232] Opened local filesystem: /n/users/bogdan/yugabyte-data/node-1/disk-1
uuid: "6358c5c4ad334470b1e241633bf36db3"
format_stamp: "Formatted at 2019-04-15 20:10:18 on bogdan-dev-server"
I0415 20:11:40.908673 19798 server_base.cc:198] Auto setting FLAGS_num_reactor_threads to 4
I0415 20:11:40.911162 19798 master_main.cc:111] Starting Master server...
I0415 20:11:40.915254 19798 webserver.cc:147] Starting webserver on 127.0.0.1:7000
I0415 20:11:40.915282 19798 webserver.cc:152] Document root: /n/users/bogdan/code/yugabyte/www
I0415 20:11:40.915719 19798 webserver.cc:239] Webserver started. Bound to: http://127.0.0.1:7000/
I0415 20:11:40.916085 19798 rpc_server.cc:167] RPC server started. Bound to: 127.0.0.1:7100
I0415 20:11:40.916134 19798 server_base.cc:450] Using private ip address 127.0.0.1
I0415 20:11:40.916353 19819 sys_catalog.cc:195] Trying to load previous SysCatalogTable data from disk
I0415 20:11:40.925871 19819 consensus_meta.cc:248] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Updating active role from UNKNOWN_ROLE to FOLLOWER. Consensus state: current_term: 1 leader_uuid: "" config { opid_index: -1 peers { permanent_uuid: "6358c5c4ad334470b1e241633bf36db3" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 7100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } }, has_pending_config = 0
I0415 20:11:40.925966 19819 sys_catalog.cc:243] Configuring consensus for local operation...
I0415 20:11:40.925985 19819 server_base.cc:450] Using private ip address 127.0.0.1
I0415 20:11:40.926267 19819 tablet_peer.cc:963] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3 [state=BOOTSTRAPPING]: Changed state from NOT_STARTED to BOOTSTRAPPING
I0415 20:11:40.926725 19819 consensus_meta.cc:248] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Updating active role from UNKNOWN_ROLE to FOLLOWER. Consensus state: current_term: 1 leader_uuid: "" config { opid_index: -1 peers { permanent_uuid: "6358c5c4ad334470b1e241633bf36db3" member_type: VOTER last_known_private_addr { host: "127.0.0.1" port: 7100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } }, has_pending_config = 0
I0415 20:11:40.926757 19819 tablet_bootstrap_if.cc:71] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Bootstrap starting.
I0415 20:11:40.928683 19819 docdb_rocksdb_util.cc:390] Auto setting FLAGS_rocksdb_max_background_flushes to 1
I0415 20:11:40.928714 19819 docdb_rocksdb_util.cc:409] Auto setting FLAGS_rocksdb_max_background_compactions to 1
I0415 20:11:40.928719 19819 docdb_rocksdb_util.cc:416] Auto setting FLAGS_rocksdb_base_background_compactions to 1
I0415 20:11:40.928814 19819 tablet.cc:421] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Creating RocksDB database in dir /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000
I0415 20:11:40.932165 19819 tablet.cc:507] Opening RocksDB at: /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000
I0415 20:11:40.935513 19819 version_set.cc:2802] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Recovered from manifest file:/n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/MANIFEST-000001 succeeded,manifest_file_number is 1, next_file_number is 3, last_sequence is 1125899906842624, log_number is 0,prev_log_number is 0,max_column_family is 0, flushed_values is <NULL>
I0415 20:11:40.936498 19819 version_set.cc:2810] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Column family [default] (ID 0), log number is 0
I0415 20:11:40.937214 19819 event_logger.cc:67] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: EVENT_LOG_v1 {"time_micros": 1555359100937184, "job": 1, "event": "recovery_started", "log_files": [3]}
I0415 20:11:40.937857 19819 db_impl.cc:1217] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Recovering log #3 mode 0 skip-recovery 0
I0415 20:11:40.938181 19819 version_set.cc:2250] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Creating manifest 5
I0415 20:11:40.940414 19819 version_set.cc:3309] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Writing version edit: comparator: "leveldb.BytewiseComparator"

I0415 20:11:40.940480 19819 version_set.cc:3309] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Writing version edit: log_number: 0

I0415 20:11:40.946034 19819 event_logger.cc:67] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: EVENT_LOG_v1 {"time_micros": 1555359100946027, "job": 1, "event": "recovery_finished"}
I0415 20:11:40.950700 19819 db_impl.cc:870] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: [JOB 2] Delete /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000//MANIFEST-000001 type=4 #1 -- OK
I0415 20:11:40.952256 19819 db_impl.cc:870] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: [JOB 2] Delete /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000//000003.log type=0 #3 -- OK
I0415 20:11:40.962945 19819 db_impl.cc:5820] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: DB pointer 0x130c900
I0415 20:11:40.963711 19819 tablet.cc:554] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Successfully opened a RocksDB database at /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000, obj: 0x130c900
I0415 20:11:40.963810 19819 tablet_bootstrap.cc:400] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Time spent opening tablet: real 0.035s  user 0.003s sys 0.004s
I0415 20:11:40.965260 19819 tablet_bootstrap.cc:466] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Will attempt to recover log segment /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001 to /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.recovery/wal-000000001
I0415 20:11:40.965451 19819 tablet_bootstrap.cc:466] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Will attempt to recover log segment /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000002 to /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.recovery/wal-000000002
I0415 20:11:40.965556 19819 tablet_bootstrap.cc:466] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Will attempt to recover log segment /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000003 to /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.recovery/wal-000000003
I0415 20:11:40.965713 19819 tablet_bootstrap.cc:466] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Will attempt to recover log segment /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000004 to /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.recovery/wal-000000004
I0415 20:11:40.969038 19819 tablet_bootstrap.cc:466] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Will attempt to recover log segment /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000005 to /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.recovery/wal-000000005
I0415 20:11:40.969631 19819 tablet_bootstrap.cc:466] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Will attempt to recover log segment /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000006 to /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.recovery/wal-000000006
I0415 20:11:40.969673 19819 tablet_bootstrap.cc:474] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Moving log directory /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000 to recovery directory /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.recovery in preparation for log replay
W0415 20:11:40.972954 19840 master_service.cc:102] Could not set master raft config : Illegal state (yb/master/catalog_manager.cc:6545): Node 6358c5c4ad334470b1e241633bf36db3 peer not initialized.
W0415 20:11:40.981983 19819 log_util.cc:208] Could not read footer for segment: /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.recovery/wal-000000006: Not found (yb/consensus/log_util.cc:443): Footer not found. Footer magic doesn't match
W0415 20:11:40.988890 19819 log_reader.cc:168] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Log segment /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.recovery/wal-000000006 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
W0415 20:11:40.995381 19867 master_service.cc:102] Could not set master raft config : Illegal state (yb/master/catalog_manager.cc:6545): Node 6358c5c4ad334470b1e241633bf36db3 peer not initialized.
I0415 20:11:41.018163 19819 log_util.cc:563] Scanning /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.recovery/wal-000000006 for valid entry headers following offset 3519029...
I0415 20:11:41.022873 19819 log_util.cc:600] Found no log entry headers
I0415 20:11:41.026856 19819 log_util.cc:523] Ignoring log segment corruption in /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.recovery/wal-000000006 because there are no log entries following the corrupted one. The server probably crashed in the middle of writing an entry to the write-ahead log or downloaded an active log via remote bootstrap. Error detail: Corruption (yb/consensus/log_util.cc:673): Invalid checksum in log entry head header: found=0, computed=2351477386: Log file corruption detected. Failed trying to read batch #518 at offset 3519029 for log segment /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.recovery/wal-000000006: Prior batch offsets: 3518704 3518732 3519001 3519029; Last log entries read: [REPLICATE (1.10703)] [REPLICATE (1.10704)] [REPLICATE (1.10705)] [REPLICATE (1.10706)]
I0415 20:11:41.032953 19819 log_util.cc:260] Successfully rebuilt footer for segment: /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.recovery/wal-000000006 (valid entries through byte offset 3519029)
W0415 20:11:41.033442 19840 master_service.cc:102] Could not set master raft config : Illegal state (yb/master/catalog_manager.cc:6545): Node 6358c5c4ad334470b1e241633bf36db3 peer not initialized.
W0415 20:11:41.036098 19840 master_service.cc:102] Could not set master raft config : Illegal state (yb/master/catalog_manager.cc:6545): Node 6358c5c4ad334470b1e241633bf36db3 peer not initialized.
I0415 20:11:41.039383 19819 tablet_bootstrap.cc:775] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Max persistent index in RocksDB's SSTables before bootstrap: term: 0 index: 0/term: 0 index: 0
I0415 20:11:41.041385 19819 log.cc:399] durable_wal_write is turned on.
W0415 20:11:41.046124 19876 env_posix.cc:384] The filesystem does not support fallocate().
I0415 20:11:41.076074 19819 log.cc:542] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Max segment size 1048576 reached. Starting new segment allocation. 
I0415 20:11:41.077257 19819 tablet_bootstrap_if.cc:71] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Bootstrap replayed 1/6 log segments. Stats: ops{read=111 overwritten=0} inserts{seen=0 ignored=0} mutations{seen=0 ignored=0}. Pending: 55 replicates
I0415 20:11:41.115656 19819 env_posix.cc:597] Closing file /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/.tmp.newsegmentZXjLoo with 357 blocks
I0415 20:11:41.116040 19819 env_posix.cc:603] /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/.tmp.newsegmentZXjLoo: Truncating file from size: 1462272 to size: 1458188. Preallocated size: 1048576
I0415 20:11:41.120970 19819 log.cc:453] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Rolled over to a new segment: /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000002
I0415 20:11:41.141611 19819 log.cc:542] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Max segment size 2097152 reached. Starting new segment allocation. 
I0415 20:11:41.144624 19819 tablet_bootstrap_if.cc:71] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Bootstrap replayed 2/6 log segments. Stats: ops{read=225 overwritten=0} inserts{seen=0 ignored=0} mutations{seen=0 ignored=0}. Pending: 114 replicates
I0415 20:11:41.199153 19819 env_posix.cc:597] Closing file /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/.tmp.newsegmentgqs2Yv with 621 blocks
I0415 20:11:41.199554 19819 env_posix.cc:603] /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/.tmp.newsegmentgqs2Yv: Truncating file from size: 2543616 to size: 2543381. Preallocated size: 2097152
I0415 20:11:41.205168 19819 log.cc:453] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Rolled over to a new segment: /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000003
I0415 20:11:41.296639 19819 log.cc:542] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Max segment size 4194304 reached. Starting new segment allocation. 
I0415 20:11:41.327492 19819 env_posix.cc:597] Closing file /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/.tmp.newsegment8DZlJD with 1024 blocks
I0415 20:11:41.327829 19819 env_posix.cc:603] /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/.tmp.newsegment8DZlJD: Truncating file from size: 4198400 to size: 4197353. Preallocated size: 4194304
I0415 20:11:41.333473 19819 log.cc:453] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Rolled over to a new segment: /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000004
I0415 20:11:41.333658 19819 tablet_bootstrap_if.cc:71] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Bootstrap replayed 3/6 log segments. Stats: ops{read=2225 overwritten=0} inserts{seen=0 ignored=0} mutations{seen=0 ignored=0}. Pending: 49 replicates
I0415 20:11:41.568661 19819 log.cc:542] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Max segment size 8388608 reached. Starting new segment allocation. 
I0415 20:11:41.593560 19819 tablet_bootstrap_if.cc:71] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Bootstrap replayed 4/6 log segments. Stats: ops{read=10122 overwritten=0} inserts{seen=0 ignored=0} mutations{seen=0 ignored=0}. Pending: 1 replicates
I0415 20:11:41.694114 19819 env_posix.cc:597] Closing file /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/.tmp.newsegmentNrIqQL with 1024 blocks
I0415 20:11:41.694577 19819 env_posix.cc:603] /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/.tmp.newsegmentNrIqQL: Truncating file from size: 10555392 to size: 10555047. Preallocated size: 8388608
I0415 20:11:41.700196 19819 log.cc:453] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Rolled over to a new segment: /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000005
I0415 20:11:41.980392 19819 log.cc:542] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Max segment size 16777216 reached. Starting new segment allocation. 
I0415 20:11:42.001070 19819 tablet_bootstrap_if.cc:71] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Bootstrap replayed 5/6 log segments. Stats: ops{read=10448 overwritten=0} inserts{seen=0 ignored=0} mutations{seen=0 ignored=0}. Pending: 1 replicates
I0415 20:11:42.015635 19819 env_posix.cc:597] Closing file /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/.tmp.newsegmentTIt0BU with 1024 blocks
I0415 20:11:42.015909 19819 env_posix.cc:603] /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/.tmp.newsegmentTIt0BU: Truncating file from size: 16830464 to size: 16826507. Preallocated size: 16777216
I0415 20:11:42.021293 19819 log.cc:453] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Rolled over to a new segment: /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000006
W0415 20:11:42.038336 19867 master_service.cc:102] Could not set master raft config : Illegal state (yb/master/catalog_manager.cc:6545): Node 6358c5c4ad334470b1e241633bf36db3 peer not initialized.
I0415 20:11:42.106612 19819 tablet_bootstrap_if.cc:71] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Bootstrap replayed 6/6 log segments. Stats: ops{read=10706 overwritten=0} inserts{seen=0 ignored=0} mutations{seen=0 ignored=0}. Pending: 1 replicates
I0415 20:11:42.106806 19819 tablet_bootstrap.cc:860] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Dumping replay state to log at the end of PlaySegments
I0415 20:11:42.106863 19819 tablet_bootstrap.cc:736] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: ReplayState: Previous OpId: 1.10706, Committed OpId: 1.10706, Pending Replicates: 0, Flushed Regular: 0.0, Flushed Intents: 0.0
I0415 20:11:42.106892 19819 tablet_bootstrap.cc:736] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Log entries applied to RocksDB: 10706
I0415 20:11:42.106900 19819 tablet_bootstrap.cc:871] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Number of orphaned replicates: 0, last id: 1.10706, commited id: 1.10706
I0415 20:11:42.116808 19819 tablet_bootstrap.cc:504] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Preparing to delete log recovery files and directory /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.recovery
I0415 20:11:42.116933 19819 tablet_bootstrap.cc:507] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Renaming log recovery dir from /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.recovery to /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.recovery-1555359102116931
I0415 20:11:42.118371 19819 tablet_bootstrap.cc:517] T 00000000000000000000000000000000 P 6358c5c4ad334470b1e241633bf36db3: Deleting all files from renamed log recovery directory /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.recovery-1555359102116931
W0415 20:11:42.143286 19819 env_posix.cc:1496] Could not delete directory: IO error (yb/util/env_posix.cc:1132): /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.recovery-1555359102116931: Directory not empty (error 39)
I0415 20:11:42.170977 19819 env_posix.cc:597] Closing file /n/users/bogdan/yugabyte-data/node-1/disk-1/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/.tmp.newsegmentpp6Sm4 with 857 blocks
F0415 20:11:42.171454 19819 env_posix.cc:600] Check failed: fsize == std::max(filesize_, pre_allocated_size_) (3510272 vs. 33554432) 

This happened during a yb-ctl create/stop/start test on my dev environment, with files on NFS.

cc @mbautin @hectorgcr

bmatican commented 5 years ago

I have not seen this in development since. Closing.

Arttii commented 4 years ago

I am getting this error on 2.1.2.0-b10. After a master restart a very similiar error picture pops up, when running on-top of EMC Isilon NFS.

I0427 16:17:06.513187    22 event_logger.cc:67] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [R]: EVENT_LOG_v1 {"time_micros": 1588004226513170, "job": 1, "event": "recovery_finished"}
I0427 16:17:06.535516    22 db_impl.cc:1154] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [R]: [JOB 2] Delete /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000//MANIFEST-000018 type=4 #18 -- OK
I0427 16:17:06.557935    22 db_impl.cc:1154] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [R]: [JOB 2] Delete /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000//000019.log type=0 #19 -- OK
I0427 16:17:06.618950    22 tablet.cc:620] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4: Opening intents DB at: /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.intents
I0427 16:17:06.643824    22 version_set.cc:2818] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: Recovered from manifest file:/mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.intents/MANIFEST-000008 succeeded,manifest_file_number is 8, next_file_number is 10, last_sequence is 1125899906842624, log_number is 0,prev_log_number is 0,max_column_family is 0, flushed_values is 0x0000000001f96cc0 -> { op_id: 1.4 hybrid_time: { physical: 1588003561276358 logical: 5 } history_cutoff: <invalid> hybrid_time_filter:  }
I0427 16:17:06.643860    22 version_set.cc:2826] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: Column family [default] (ID 0), log number is 4
I0427 16:17:06.647893    22 event_logger.cc:67] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: EVENT_LOG_v1 {"time_micros": 1588004226647881, "job": 1, "event": "recovery_started", "log_files": [9]}
I0427 16:17:06.651748    22 db_impl.cc:1501] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: Recovering log #9 mode 0 skip-recovery 0
I0427 16:17:06.651837    22 version_set.cc:2259] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: Creating manifest 11
I0427 16:17:06.659281    22 version_set.cc:3325] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: Writing version edit: comparator: "leveldb.BytewiseComparator"
I0427 16:17:06.659440    22 version_set.cc:3325] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: Writing version edit: log_number: 4
flushed_frontier {
  [type.googleapis.com/yb.docdb.ConsensusFrontierPB] {
    op_id {
      term: 1
      index: 4
    }
    hybrid_time: 6504462586987962373
    history_cutoff: 18446744073709551614
  }
}
I0427 16:17:06.743016    22 event_logger.cc:67] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: EVENT_LOG_v1 {"time_micros": 1588004226742998, "job": 1, "event": "recovery_finished"}
I0427 16:17:06.766126    22 db_impl.cc:1154] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: [JOB 2] Delete /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.intents//MANIFEST-000008 type=4 #8 -- OK
I0427 16:17:06.788118    22 db_impl.cc:1154] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: [JOB 2] Delete /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.intents//000009.log type=0 #9 -- OK
I0427 16:17:06.897486    22 tablet.cc:664] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4: Successfully opened a RocksDB database at /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000, obj: 0x2060000
I0427 16:17:06.897553    22 tablet_bootstrap.cc:431] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4: Time spent opening tablet: real 0.505s  user 0.002s sys 0.012s
I0427 16:17:06.897809    26 transaction_participant.cc:945] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4: LoadTransactions start
I0427 16:17:06.897876    26 transaction_participant.cc:984] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4: LoadTransactions done: loaded 0 transactions
I0427 16:17:06.905642    22 tablet_bootstrap.cc:497] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4: Will attempt to recover log segment /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001
I0427 16:17:06.905675    22 tablet_bootstrap.cc:497] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4: Will attempt to recover log segment /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000002
I0427 16:17:06.905824    22 tablet_bootstrap.cc:833] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4: Max persistent index in RocksDB's SSTables before bootstrap: term: 1 index: 4/term: 1 index: 4
I0427 16:17:06.909987    22 log.cc:808] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4: Setting log wal retention time to 0 seconds
W0427 16:17:06.915786    22 log_util.cc:211] Could not read footer for segment: /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001: Not found (yb/consensus/log_util.cc:446): Footer not found. Footer magic doesn't match
W0427 16:17:06.915815    22 log_reader.cc:182] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4: Log segment /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
I0427 16:17:06.921162    22 log_util.cc:657] Scanning /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001 for valid entry headers following offset 268191...
I0427 16:17:06.921231    22 log_util.cc:703] Found no log entry headers
I0427 16:17:06.921250    22 log_util.cc:526] Ignoring log segment corruption in /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001 because there are no log entries following the corrupted one. The server probably crashed in the middle of writing an entry to the write-ahead log or downloaded an active log via remote bootstrap. Error detail: Corruption (yb/consensus/log_util.cc:776): Invalid checksum in log entry head header: found=0, computed=2351477386: Log file corruption detected. Failed trying to read batch #1151 at offset 268191 for log segment /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001: Prior batch offsets: 267248 267276 268163 268191; Last log entries read: [REPLICATE (1.572)] [REPLICATE (1.573)] [REPLICATE (1.574)] [REPLICATE (1.575)]
I0427 16:17:06.921277    22 log_util.cc:263] Successfully rebuilt footer for segment: /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001 (valid entries through byte offset 268191)
W0427 16:17:06.925917    22 log.cc:1179] Error closing log: Illegal state (yb/consensus/log.cc:1012): Bad state for Close() 0
I0427 16:17:06.926023    22 transaction_participant.cc:141] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4: Shutdown
I0427 16:17:06.926138    22 db_impl.cc:641] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: Shutting down RocksDB at: /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.intents
I0427 16:17:06.946408    22 db_impl.cc:751] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [I]: Shutdown done
I0427 16:17:06.946509    22 db_impl.cc:641] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [R]: Shutting down RocksDB at: /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000
I0427 16:17:06.967371    22 db_impl.cc:751] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [R]: Shutdown done
E0427 16:17:06.967690    22 master.cc:241] Master@10.233.94.165:7100: Unable to init master catalog manager: IO error (yb/util/env_util.cc:156): Unable to initialize catalog manager: Failed to initialize sys tables async: Failed log replay. Reason: Failed to open new log: Unable to open readable log segment: /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000002: Unable to initialize segment: EOF trying to read 12 bytes at offset 0
E0427 16:17:06.967831     1 main_util.cc:22] IO error (yb/util/env_util.cc:156): Unable to initialize catalog manager: Failed to initialize sys tables async: Failed log replay. Reason: Failed to open new log: Unable to open readable log segment: /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000002: Unable to initialize segment: EOF trying to read 12 bytes at offset 0
I0427 16:17:06.967880     1 master.cc:284] Master@10.233.94.165:7100 shutting down...
E0427 16:17:06.970228    15 async_initializer.cc:83] Failed to initialize client: Aborted (yb/rpc/reactor.cc:85): Could not locate the leader master: Shutdown connection (system error 108)
I0427 16:17:06.980762     1 tablet_peer.cc:380] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [state=BOOTSTRAPPING]: Initiating TabletPeer shutdown
I0427 16:17:06.980800     1 tablet_peer.cc:397] T 00000000000000000000000000000000 P 21421e45d3a84723bbb298bf65c7b8f4 [state=QUIESCING]: Started shutdown from state: BOOTSTRAPPING
I0427 16:17:06.981094     1 master.cc:292] Master@10.233.94.165:7100 shutdown complete.
bmatican commented 4 years ago

@Arttii Thanks for your interest in Yugabyte.

It seems we still have some kinks with running on NFS. See https://github.com/yugabyte/yugabyte-db/issues/3871

Can you confirm you're seeing something like this in your logs:

W0415 20:11:41.046124 19876 env_posix.cc:384] The filesystem does not support fallocate().

Arttii commented 4 years ago

HI, somehow no. This is the entire log I have https://gist.github.com/Arttii/6c5d47aa7c569a6354b57a617d7f4071. I also tried to set durable_wal_write to true, but it did not really help. The file its referencing wal-000000002 is there, just empty.

EDIT: Actually as we are running his on k8s, the quick container restart was hiding another error picture:

I0428 11:40:34.975433     1 master_main.cc:93] NumCPUs determined to be: 2
I0428 11:40:34.977380     1 mem_tracker.cc:249] MemTracker: hard memory limit is 3.273251 GB
I0428 11:40:34.977401     1 mem_tracker.cc:251] MemTracker: soft memory limit is 2.782263 GB
I0428 11:40:34.979595     1 rpc_server.cc:86] yb::server::RpcServer created at 0x28ce4d0
I0428 11:40:34.979629     1 master.cc:143] yb::master::Master created at 0x7ffc126e69e0
I0428 11:40:34.979651     1 master.cc:144] yb::master::TSManager created at 0x28e1340
I0428 11:40:34.979655     1 master.cc:145] yb::master::CatalogManager created at 0x2c2c000
I0428 11:40:34.983089     1 master_main.cc:114] Initializing master server...
I0428 11:40:34.992725     1 fs_manager.cc:249] Opened local filesystem: /mnt/data0
uuid: "2920707d7f38428d969f1ff0ec35a922"
format_stamp: "Formatted at 2020-04-28 11:34:10 on yb-master-2"
I0428 11:40:34.993038     1 server_base.cc:231] Auto setting FLAGS_num_reactor_threads to 2
I0428 11:40:34.998124     1 master_main.cc:117] Starting Master server...
I0428 11:40:34.998668     1 service_pool.cc:148] yb.master.MasterBackupService: yb::rpc::ServicePoolImpl created at 0x28d2480
I0428 11:40:34.998800    16 async_initializer.cc:73] Starting to init ybclient
I0428 11:40:35.000600     1 service_pool.cc:148] yb.master.MasterService: yb::rpc::ServicePoolImpl created at 0x28d2b40
I0428 11:40:35.000885     1 service_pool.cc:148] yb.tserver.TabletServerService: yb::rpc::ServicePoolImpl created at 0x28d3680
I0428 11:40:35.001070     1 service_pool.cc:148] yb.consensus.ConsensusService: yb::rpc::ServicePoolImpl created at 0x28d3b00
I0428 11:40:35.004850     1 service_pool.cc:148] yb.tserver.RemoteBootstrapService: yb::rpc::ServicePoolImpl created at 0x28d3d40
I0428 11:40:35.004969     1 webserver.cc:148] Starting webserver on 0.0.0.0:7000
I0428 11:40:35.004978     1 webserver.cc:153] Document root: /home/yugabyte/www
I0428 11:40:35.005414     1 webserver.cc:240] Webserver started. Bound to: http://0.0.0.0:7000/
I0428 11:40:35.005573     1 service_pool.cc:148] yb.server.GenericService: yb::rpc::ServicePoolImpl created at 0x2ef0000
I0428 11:40:35.006211     1 rpc_server.cc:169] RPC server started. Bound to: 10.233.99.226:7100
I0428 11:40:35.006247     1 server_base.cc:487] Using private ip address yb-master-2.yb-masters.dap-base.svc.cluster.local
I0428 11:40:35.006856    22 sys_catalog.cc:194] Trying to load previous SysCatalogTable data from disk
I0428 11:40:35.043766    22 consensus_meta.cc:275] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Updating active role from UNKNOWN_ROLE to FOLLOWER. Consensus state: current_term: 1 leader_uuid: "" config { opid_index: -1 peers { permanent_uuid: "db2e76eb4dc7415c8e32a41389b5e8af" member_type: VOTER last_known_private_addr { host: "yb-master-0.yb-masters.dap-base.svc.cluster.local" port: 7100 } last_known_broadcast_addr { host: "yb-master-0.yb-masters.dap-base.svc.cluster.local" port: 7100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } peers { permanent_uuid: "3b99165b0e0b42a68002a7a34db19661" member_type: VOTER last_known_private_addr { host: "yb-master-1.yb-masters.dap-base.svc.cluster.local" port: 7100 } last_known_broadcast_addr { host: "yb-master-1.yb-masters.dap-base.svc.cluster.local" port: 7100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } peers { permanent_uuid: "2920707d7f38428d969f1ff0ec35a922" member_type: VOTER last_known_private_addr { host: "yb-master-2.yb-masters.dap-base.svc.cluster.local" port: 7100 } last_known_broadcast_addr { host: "yb-master-2.yb-masters.dap-base.svc.cluster.local" port: 7100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } }, has_pending_config = 0
I0428 11:40:35.043872    22 sys_catalog.cc:239] Configuring consensus for distributed operation...
I0428 11:40:35.043903    22 server_base.cc:487] Using private ip address yb-master-2.yb-masters.dap-base.svc.cluster.local
I0428 11:40:35.044342    22 tablet_peer.cc:1156] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [state=BOOTSTRAPPING]: Changed state from NOT_STARTED to BOOTSTRAPPING
I0428 11:40:35.046813    22 consensus_meta.cc:275] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Updating active role from UNKNOWN_ROLE to FOLLOWER. Consensus state: current_term: 1 leader_uuid: "" config { opid_index: -1 peers { permanent_uuid: "db2e76eb4dc7415c8e32a41389b5e8af" member_type: VOTER last_known_private_addr { host: "yb-master-0.yb-masters.dap-base.svc.cluster.local" port: 7100 } last_known_broadcast_addr { host: "yb-master-0.yb-masters.dap-base.svc.cluster.local" port: 7100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } peers { permanent_uuid: "3b99165b0e0b42a68002a7a34db19661" member_type: VOTER last_known_private_addr { host: "yb-master-1.yb-masters.dap-base.svc.cluster.local" port: 7100 } last_known_broadcast_addr { host: "yb-master-1.yb-masters.dap-base.svc.cluster.local" port: 7100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } peers { permanent_uuid: "2920707d7f38428d969f1ff0ec35a922" member_type: VOTER last_known_private_addr { host: "yb-master-2.yb-masters.dap-base.svc.cluster.local" port: 7100 } last_known_broadcast_addr { host: "yb-master-2.yb-masters.dap-base.svc.cluster.local" port: 7100 } cloud_info { placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" } } }, has_pending_config = 0
I0428 11:40:35.046852    22 tablet_bootstrap_if.cc:75] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Bootstrap starting.
I0428 11:40:35.062943    22 tablet.cc:376] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922:  Schema version for  sys.catalog is 0
I0428 11:40:35.063233    22 transaction_participant.cc:119] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Create
I0428 11:40:35.063627    22 docdb_rocksdb_util.cc:272] Auto setting FLAGS_rocksdb_max_background_flushes to 1
I0428 11:40:35.063661    22 docdb_rocksdb_util.cc:295] Auto setting FLAGS_rocksdb_max_background_compactions to 1
I0428 11:40:35.063666    22 docdb_rocksdb_util.cc:306] Auto setting FLAGS_rocksdb_base_background_compactions to 1
I0428 11:40:35.063670    22 docdb_rocksdb_util.cc:329] Auto setting FLAGS_priority_thread_pool_size to 1
I0428 11:40:35.063899    22 tablet.cc:479] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Creating RocksDB database in dir /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000
I0428 11:40:35.076493    22 tablet.cc:605] Opening RocksDB at: /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000
I0428 11:40:35.114245    22 version_set.cc:2818] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [R]: Recovered from manifest file:/mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/MANIFEST-000016 succeeded,manifest_file_number is 16, next_file_number is 18, last_sequence is 1125899908347196, log_number is 0,prev_log_number is 0,max_column_family is 0, flushed_values is 0x00000000030e3080 -> { op_id: 1.4 hybrid_time: { physical: 1588073669996455 } history_cutoff: <invalid> hybrid_time_filter: <invalid> }
I0428 11:40:35.114280    22 version_set.cc:2826] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [R]: Column family [default] (ID 0), log number is 3
I0428 11:40:35.119555    22 event_logger.cc:67] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [R]: EVENT_LOG_v1 {"time_micros": 1588074035119521, "job": 1, "event": "recovery_started", "log_files": [13]}
I0428 11:40:35.122673    22 db_impl.cc:1501] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [R]: Recovering log #13 mode 0 skip-recovery 0
I0428 11:40:35.122741    22 version_set.cc:2259] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [R]: Creating manifest 18
I0428 11:40:35.130498    22 version_set.cc:3325] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [R]: Writing version edit: comparator: "leveldb.BytewiseComparator"

I0428 11:40:35.130973    22 version_set.cc:3325] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [R]: Writing version edit: log_number: 3
new_files {
  level: 0
  number: 10
  total_file_size: 23914435
  base_file_size: 110376
  smallest {
    key: "H\200\000\000\001S000000010000300080000000000000af\000\000!J\200#\200\001\263\303\311x\026\312\200J\001\250X\002\000\000\000\004"
    seqno: 1125899906842625
    user_values {
      tag: 1
      data: "\200\001\263\303\311\324;C\200J"
    }
    user_values {
      tag: 10
      data: "H\200\000\000\001"
    }
    user_values {
      tag: 11
      data: "$"
    }
    user_values {
      tag: 12
      data: "$"
    }
    user_values {
      tag: 13
      data: "$"
    }
    user_values {
      tag: 14
      data: "$"
    }
  }
  largest {
    key: "y\016\340@\000\000\000\000\200\0000\000\000\001\000\000\000O\000\0000\252!K\215#\200\001\263\303\306\326\200\307\200>+\0018\365\026\000\000\000\004"
    seqno: 1125899908347196
    user_values {
      tag: 1
      data: "\200\001\263\303\306\326a\315\200?\253"
    }
    user_values {
      tag: 10
      data: "S\377\350\240Yf\327N\336\264\244\352,\016Jj\332\000\000"
    }
    user_values {
      tag: 11
      data: "Sysql-catalog-configuration\000\000"
    }
    user_values {
      tag: 12
      data: "Syes_or_no_check\000\000"
    }
    user_values {
      tag: 13
      data: "SS\377\350\240Yf\327N\336\264\244\352,\016Jj\332\000\001\000\001!\000\000"
    }
    user_values {
      tag: 14
      data: "SS\323\006E\217\024\177A\306\256\212\244\201\002\261\377\325\000\001\000\001!\000\000"
    }
  }
}
flushed_frontier {
  [type.googleapis.com/yb.docdb.ConsensusFrontierPB] {
    op_id {
      term: 1
      index: 4
    }
    hybrid_time: 6504749752305479680
    history_cutoff: 18446744073709551614
  }
}

I0428 11:40:35.227035    22 event_logger.cc:67] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [R]: EVENT_LOG_v1 {"time_micros": 1588074035227022, "job": 1, "event": "recovery_finished"}
I0428 11:40:35.240010    22 db_impl.cc:1154] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [R]: [JOB 2] Delete /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000//MANIFEST-000016 type=4 #16 -- OK
I0428 11:40:35.256011    22 db_impl.cc:1154] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [R]: [JOB 2] Delete /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000//MANIFEST-000011 type=4 #11 -- OK
I0428 11:40:35.270684    22 db_impl.cc:1154] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [R]: [JOB 2] Delete /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000//000013.log type=0 #13 -- OK
I0428 11:40:35.348821    22 tablet.cc:620] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Opening intents DB at: /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.intents
I0428 11:40:35.376628    22 version_set.cc:2818] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [I]: Recovered from manifest file:/mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.intents/MANIFEST-000006 succeeded,manifest_file_number is 6, next_file_number is 8, last_sequence is 1125899906842624, log_number is 0,prev_log_number is 0,max_column_family is 0, flushed_values is 0x00000000030c8b10 -> { op_id: 1.4 hybrid_time: { physical: 1588073669996455 } history_cutoff: <invalid> hybrid_time_filter: <invalid> }
I0428 11:40:35.376659    22 version_set.cc:2826] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [I]: Column family [default] (ID 0), log number is 0
I0428 11:40:35.379743    22 event_logger.cc:67] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [I]: EVENT_LOG_v1 {"time_micros": 1588074035379733, "job": 1, "event": "recovery_started", "log_files": [3]}
I0428 11:40:35.382517    22 db_impl.cc:1501] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [I]: Recovering log #3 mode 0 skip-recovery 0
I0428 11:40:35.382645    22 version_set.cc:2259] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [I]: Creating manifest 8
I0428 11:40:35.389412    22 version_set.cc:3325] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [I]: Writing version edit: comparator: "leveldb.BytewiseComparator"

I0428 11:40:35.389641    22 version_set.cc:3325] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [I]: Writing version edit: log_number: 0
flushed_frontier {
  [type.googleapis.com/yb.docdb.ConsensusFrontierPB] {
    op_id {
      term: 1
      index: 4
    }
    hybrid_time: 6504749752305479680
    history_cutoff: 18446744073709551614
  }
}

I0428 11:40:35.413612    22 event_logger.cc:67] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [I]: EVENT_LOG_v1 {"time_micros": 1588074035413603, "job": 1, "event": "recovery_finished"}
I0428 11:40:35.427080    22 db_impl.cc:1154] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [I]: [JOB 2] Delete /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.intents//MANIFEST-000006 type=4 #6 -- OK
I0428 11:40:35.451843    22 db_impl.cc:1154] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [I]: [JOB 2] Delete /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.intents//MANIFEST-000001 type=4 #1 -- OK
I0428 11:40:35.468727    22 db_impl.cc:1154] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922 [I]: [JOB 2] Delete /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000.intents//000003.log type=0 #3 -- OK
I0428 11:40:35.580117    22 tablet.cc:664] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Successfully opened a RocksDB database at /mnt/data0/yb-data/master/data/rocksdb/table-sys.catalog.uuid/tablet-00000000000000000000000000000000, obj: 0x2987e00
I0428 11:40:35.580503    22 tablet_bootstrap.cc:431] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Time spent opening tablet: real 0.517s  user 0.006s sys 0.006s
I0428 11:40:35.580380    25 transaction_participant.cc:945] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: LoadTransactions start
I0428 11:40:35.580926    25 transaction_participant.cc:984] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: LoadTransactions done: loaded 0 transactions
I0428 11:40:35.595329    22 tablet_bootstrap.cc:497] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Will attempt to recover log segment /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001
I0428 11:40:35.595429    22 tablet_bootstrap.cc:833] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Max persistent index in RocksDB's SSTables before bootstrap: term: 1 index: 4/term: 1 index: 4
I0428 11:40:35.598526    22 log.cc:808] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Setting log wal retention time to 0 seconds
W0428 11:40:35.608687    22 log_util.cc:211] Could not read footer for segment: /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001: Not found (yb/consensus/log_util.cc:446): Footer not found. Footer magic doesn't match
W0428 11:40:35.608716    22 log_reader.cc:182] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Log segment /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
I0428 11:40:35.617741    22 log_util.cc:657] Scanning /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001 for valid entry headers following offset 260300...
I0428 11:40:35.617780    22 log_util.cc:703] Found no log entry headers
I0428 11:40:35.617797    22 log_util.cc:526] Ignoring log segment corruption in /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001 because there are no log entries following the corrupted one. The server probably crashed in the middle of writing an entry to the write-ahead log or downloaded an active log via remote bootstrap. Error detail: Corruption (yb/consensus/log_util.cc:776): Invalid checksum in log entry head header: found=0, computed=2351477386: Log file corruption detected. Failed trying to read batch #898 at offset 260300 for log segment /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001: Prior batch offsets: 259359 259387 260272 260300; Last log entries read: [REPLICATE (1.571)] [REPLICATE (1.572)] [REPLICATE (1.573)] [REPLICATE (1.574)]
I0428 11:40:35.617812    22 log_util.cc:263] Successfully rebuilt footer for segment: /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001 (valid entries through byte offset 260300)
I0428 11:40:35.618870    22 log.cc:412] T 00000000000000000000000000000000 P 2920707d7f38428d969f1ff0ec35a922: Opened existing logs. Last segment is /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000001
I0428 11:40:35.618898    22 log.cc:416] durable_wal_write is turned on.
W0428 11:40:35.627832    26 env_posix.cc:317] The filesystem does not support fallocate().
I0428 11:40:35.661057    22 tablet_bootstrap.cc:857] Bootstrap optimizer: op_id_replay_lowest={ term: 1 index: 4 }
I0428 11:40:35.661116    22 log_util.cc:657] Scanning /mnt/data0/yb-data/master/wals/table-sys.catalog.uuid/tablet-00000000000000000000000000000000/wal-000000002 for valid entry headers following offset 162...
*** Aborted at 1588074036 (unix time) try "date -d @1588074036" if you are using GNU date ***
PC: @     0x7f0af67c9c0b yb::log::ReadableLogSegment::DecodeEntryHeader()
*** SIGSEGV (@0x36b2000) received by PID 1 (TID 0x7f0ae04ae700) from PID 57352192; stack trace: ***
    @     0x7f0aea847ba0 (unknown)
    @     0x7f0af67c9c0a  yb::log::ReadableLogSegment::DecodeEntryHeader()
    @     0x7f0af67ca097  yb::log::ReadableLogSegment::ScanForValidEntryHeaders()
    @     0x7f0af67d198c  yb::log::ReadableLogSegment::ReadFirstEntryMetadata()
    @     0x7f0af810a54e  yb::tablet::TabletBootstrap::PlaySegments()
    @     0x7f0af810c25f  yb::tablet::TabletBootstrap::Bootstrap()
    @     0x7f0af8111e33  yb::tablet::BootstrapTablet()
    @     0x7f0af8e6f12e  yb::master::SysCatalogTable::OpenTablet()
    @     0x7f0af8e7056f  yb::master::SysCatalogTable::SetupTablet()
    @     0x7f0af8e70d79  yb::master::SysCatalogTable::Load()
    @     0x7f0af8d9992d  yb::master::CatalogManager::InitSysCatalogAsync()
    @     0x7f0af8da6238  yb::master::CatalogManager::Init()
    @     0x7f0af8e3e8a9  yb::master::Master::InitCatalogManager()
    @     0x7f0af8e3e9b5  yb::master::Master::InitCatalogManagerTask()
    @     0x7f0aef0fb8a3  yb::ThreadPool::DispatchThread()
    @     0x7f0aef0f7fae  yb::Thread::SuperviseThread()
    @     0x7f0aea83f694 start_thread
    @     0x7f0ae9f7c41d __clone
    @                0x0 (unknown)

If I delete the file this error repeats itself. Actually it looks very similar to https://github.com/yugabyte/yugabyte-db/issues/4157

Arttii commented 4 years ago

@bmatican is there a way to turn on more detailed logging on something? To get a fuller picture?

bmatican commented 4 years ago

W0428 11:40:35.627832 26 env_posix.cc:317] The filesystem does not support fallocate().

That is present in your logs @Arttii, which is basically #3871 that I liked to above.

We actually use fallocate and expect it to have worked (as per this crash here), so we should really convert that warning into an upfront check and crash asap when you're running on filesystems that do not give us the pre-requisites we need.

Would it be possible for you to test Yugabyte on an xfs filesystem?

Arttii commented 4 years ago

You are right, I had somehow missed the log. We are running over NFS on EMC Isilion, it seems the allocate API is not supported on it. I guess there is no way to get around it.