scrapinghub / spidermon

Scrapy Extension for monitoring spiders execution.
https://spidermon.readthedocs.io
BSD 3-Clause "New" or "Revised" License
526 stars 94 forks source link

feature: Fallback for actions #355

Closed curita closed 1 year ago

curita commented 2 years ago

This was an idea proposed by Chandral from Zyte to address Sentry downtimes, as important Spidermon alerts might get lost while this happens.

Multiple alert actions can be configured in Spidermon as a workaround to prevent messages from getting lost if one of the services is down, but receiving duplicated alerts would generate a lot of noise. Ideally, we'd like to use a backup alert system as a fallback only if the main alert system didn't work.

/cc @rennerocha @VMRuiz

rennerocha commented 1 year ago

Hello @curita !

From your description, I understood that to make this possible, we will need to create some kind of hierarchy between the actions. We will also need a way to identify if an action failed or not.

In my opinion, this will demand a big change in how we define the actions in our monitor suites and the use seems to be very specific to a very particular scenario (Sentry down) that shouldn't happen frequently. But if you have some code proposal that will allow this and keep backward compatibility, we can review it and discuss more.

Sentry is a very stable system (almost no downtime in the last month according to https://status.sentry.io/), so it looks weird that we have a number of downtime occurrences that are not negligible for our action execution.

Perhaps the implementation of SendSentryMessage action is not robust enough and is failing even if Sentry is not down?

VMRuiz commented 1 year ago

Hello @rennerocha ,

Maybe we could implement this as an Action itself? The Composed Action receives a list of other Actions and try to execute the first one. If any exception is raised the next one if the list is executed. If all actions in the list failed then the Action itself is marked as failed.

By combining these Composed Actions we can even create complex workflows. For example, the list of actions could be:

  1. Composed Action:
    • Composed Actions for Notification:
      • Open a Sentry ticket
      • or Notify in Slack
    • Composed Action for Backup:
      • Dumps logs in S3
      • Dumps logs in GCF

This is maybe a too extreme case, but I'm sure there are more trivial user cases where this logic could be applied.

curita commented 1 year ago

I think this idea originated out of that Sentry downtime, but it could be nice to have for other services that can be temporarily down too (Slack, emails?).

I like the idea of Composable Actions! Though that reminds me a bit of the Compose processor for Scrapy Itemloaders, and it might be expected that all actions run, instead of using the remaining ones as a fallback. Maybe we can borrow the concept of "errbacks" instead? And have a wrapper or a base Action class where we can define an errback for when something goes wrong. That would explicitly create that hierarchy of what to call next when something breaks in an Action (at least when it's defined).

We'd have to identify when something goes wrong in an Action. I think the most basic check could catch any exception raised by the Action, and call the errback when that happens. There could be room to customize that behavior, but that could be a default.

VMRuiz commented 1 year ago

Hello @curita ,

I like the idea of Composable Actions! Though that reminds me a bit of the Compose processor for Scrapy Itemloaders, and it might be expected that all actions run, instead of using the remaining ones as a fallback. Maybe we can borrow the concept of "errbacks" instead?

You are right, naming is not my best skill. Something like errbacks support seems much more appropiate

rennerocha commented 1 year ago

@VMRuiz

Composed Action (or Composable Actions as @curita is better with naming than us) looks an interesting concept, and as you suggested, if we create it as a custom action, it seems that it won't require changes (at least not big ones) in how our current actions are defined and executed!

I added this issue as a possible candidate for the next release, so hopefully soon a PR will appear for review :-)

further-reading commented 1 year ago

I'm gonna have a crack at adding this feature.