orcfax / Incidents

A repository to triage and report issues in Orcfax network operations
1 stars 0 forks source link

INCIDENT 021 | Validator node causes dropped websocket connection #24

Open Christian-MK opened 7 months ago

Christian-MK commented 7 months ago

Trigger

Date

2024-03-06

Summary

The heartbeat datum at 18:00 UTC on 6 March 2024 did not make it on-chain. A datum was published manually at 18:14 UTC after the anomaly was spotted through network monitoring.

Status

Under Review

Assessment

The validator node caused the websocket connection to the COOP publishing server to be dropped. The reliability of the websocket connection and potential changes to the publishing mechanism remain under investigation.

Additional Notes

While not directly related to INCIDENT 019 we know that the reliability of the websocket connection between the validator and publishing node can impact publication. How a heartbeat is requested and subsequent monitoring of its successful publication continues to be looked at as the Orcfax network is incrementally upgraded.

We are investigating:

  1. More advanced connection management by the Orcfax federated validator.
  2. More resilient publishing workflows able to request a new datum if publication isn't successful.

Documentation improvements

  1. N/A.