Assuming and giving a once over of a distributed system is good -- actually verifying it is a little better.
We don't need to do this immediately, but to be a robust piece of software, we should put in some Jepsen Tests for fault tolerance. We'll need to assume just a few likely cases (we aren't trying to make this a true fault tolerant distributed system),
(this) API Fails
Graders Start Failing in
A few at a time
Waves
Various network delays cause graders to come back online (little less likely).
Assuming and giving a once over of a distributed system is good -- actually verifying it is a little better.
We don't need to do this immediately, but to be a robust piece of software, we should put in some Jepsen Tests for fault tolerance. We'll need to assume just a few likely cases (we aren't trying to make this a true fault tolerant distributed system),