open-horizon / exchange-api

Horizon Exchange REST API Server
Apache License 2.0
11 stars 30 forks source link

Agent not getting notified of nodemsg waiting via /changes route #402

Open sf2ne opened 4 years ago

sf2ne commented 4 years ago

An agbot added a message to cancel an agreement. The agent never gets the message and so doesn’t cancel the agreement. It’s an intermittent problem that seems to be related to timing but there are seemingly no surfaced errors. In other cases (performance runs), the missing message is returned at a later time, with the msg id that it should have had (so it wasn't that it never got created, or was deleted before some agbots could read it).

Also we know that the node msg was not in the response the agent got from the /changes route.

One area investigated is that the exchange added periodic (configurable) pruning of the messages tables. This pruning was made transactionally serializable in issue e80 so technically it should be holding onto the messages tables while deleting messages.. This might be preventing reads, but should only cause a timeout (if it takes too long to prune them). But that’s only a potential cause, looked at because we are only seeing the problem now. Doug isn’t sure if we had this problem before and just didn’t run into it in the tests yet. We should redo how the pruning of the messages bubbles up errors to give it more detail in the hopes that we can get some more information. But currently we have no evidence that any of the message reads or the pruning of the messages ever error'd.

Potentially related to: https://github.com/open-horizon/exchange-api/issues/437

bmpotter commented 3 years ago

@dlarson04 said this wasn't urgent to address