Open ppca opened 3 days ago
Seem like this won't happen if requests are sent at a low rate, and starts happening when requests are sent at a higher rate (~12 signs per min). It looks like when a lot of signs came in and triple generators start kicking in, if there are too many triple generators in the system, none of them could be completed as a result.
this can only self heal after a really long time (10+ hrs), where it gets lucky that some generators finished.
Will investigate more to see if I could find an exact trigger for this behavior and exact turning point.
We will need to automate load testing. Meanwhile, we can do this https://github.com/near/mpc/issues/657
Cause: leftover triple messages for taken triples keep triggering new triple generators Fix: remove triple messages in the messageQueue’s triple bin that belongs to triples that should have timed out, taken or failed More details see this doc: https://docs.google.com/document/d/1cWQAmcu8VWWdFwH6VHvCxcaiB_X-CcwChGEcviNyE6k/edit
Will try this fix on dev, send a lot of signs to see if the same issue would be better. If not, there might be something else.
Dev triples are depleted, but triple generators have not succeeded at all