Open jlevison opened 5 years ago
@jlevison if this happens on our audit-mode node in production - have you seen any indication for the reason of the failure in system monitors? does it saturate iops capacity? what is the resource utilization of the node at this time?
How often does this happen? does it happen during random syncs or during first time massive syncs? etc...
@gadcl do you think this issue was addressed?
Describe the bug
We've seen an error such as
failed to flush blocks to disk: sync /usr/local/var/orbs/blocks: input/output error
during sync, coming fromservices/blockstorage/internodesync/state_processing_blocks.go:78
('message' will befailed to commit block received via sync
While spec wise / flow wise its valid to break the sync process, it is probably too fragile, this is part of the deadlock issue we had (see slack),
Steps To Reproduce
Steps to reproduce the behavior: fail a write during sync..
Expected behavior
perhaps some one off retry, maybe depending on the specific error, try to understand what kind of errors the flush can return to decide if some effort should be invested here to begin with