Open joshJarr opened 2 years ago
Something to suggest here is using healthchecks.io to make sure that the publishing via the alarm
continues to work.
It can watch and make sure that process runs on a schedule we set. It will notify us if it's late.
Not the most proactive resiliency but worth monitoring perhaps
We want to track the W3Name service and ensure it's running as expected and monitor its performance. Smoke tests should give us confidence in our services and the e2e user experience of using W3Name.
We could use our E2E tests to validate a few things upon merging to ensure that our service is running & the core business logic is working. There's a few ways to do this...
Validating new code on staging This should help us catch issues before releasing and is not a replacing manual testing. We should run e2e smoke tests after merging any branch into staging. These tests should run against staging and validate that a record can be published, fetched and incremented. On staging things won't persist to the DHT (right?) so we should validate this by ensuring the DO are updated.
Validating production code We could use e2e tests to run production nightly tests (or after every production release release) so that we can be confident that our services are running as expected. This could be service agnostic and cover the client library, w3name API and IPNS Publisher/DHT. The goal here is to ensure the production stack is working as expected, helping catch integration issues, IE caching.