livepeer / studio

Livepeer Studio is your home for building, broadcasting, and publishing video on the open internet with the Livepeer Network. Effortlessly manage livestreams, video uploads, API keys, network usage, billing, and more.
https://livepeer.studio
MIT License
66 stars 31 forks source link

Handle failures sending messages to RabbitMQ #1031

Open victorges opened 2 years ago

victorges commented 2 years ago

We have had a couple incidents where the API just stops sending messages to RabbitMQ and we don't even know about it, since the library we use abstracts the underlying connection and failures. I believe it just accumulates messages in memory and keeps trying to reconnect forever, hoping it will eventually succeed.

There is a way to send a callback to the publish function to make sure we only return a success to the users if we do have a success publishing the message though: https://www.npmjs.com/package/amqp-connection-manager#channelwrapperpublish-and-channelwrappersendtoqueue

Update: This is even weirder now, from the docs, since they claim that if a callback is not sent, the returned promise will only be fulfilled when the publish actually happens. So this needs further investigation. My immediate suspicion is that the lib is not actually checking broker confirmations (like a reverse ACK) and we need to enable that somehow.

Shih-Yu commented 1 year ago

Lenstube encountered a similar issue https://eu-metrics-monitoring.livepeer.live/grafana/explore?orgId=1&left=%5B%221658725200000%22,%221658811599000%22,%22Loki%22,%7B%22exemplar%22:true,%22expr%22:%22%7Bapp%3D%5C%22prod-livepeer-api%5C%22%7D%20%7C%3D%20%5C%22publishing%20message%5C%22%20%7C%3D%20%5C%226f7cc1fd-bdbe-41a4-8b55-f9f10a31e530%5C%22%22,%22refId%22:%22A%22%7D%5D