scrapinghub / spidermon

Scrapy Extension for monitoring spiders execution.
https://spidermon.readthedocs.io
BSD 3-Clause "New" or "Revised" License
534 stars 98 forks source link

Enhance slack notifications #95

Open ejulio opened 5 years ago

ejulio commented 5 years ago

Currently the notifications we receive in Slack are quite simple, without much information. For example:

*somesite spider finished with errors!* / view job in Scrapy Cloud _(errors=7)_
•  _Job validation/validation errors_

We could add more information there, something like:

*somesite spider finished with errors!* / view job in Scrapy Cloud _(errors=7)_
•  _Job validation/validation errors_
* invalid_string: 3
* missing_required_field/field_name: 4
victor-torres commented 5 years ago

I believe we need to decide if we are keeping features like JSON Schema Validation and Slack Notification on the project's core (#88, for example). Depending on what's decided, we may not invest time improving this code, at least not here in this repository.

rennerocha commented 5 years ago

@victor-torres I don't see problems creating integration with different systems (like Slack or Amazon SES E-mail, or Amazon Storage), but this couldn't be considered as the core of spidermon .

But, if some user decides to use one of these optional integration, it doesn't hurt if the defaults are more complete, so I believe it would be a good improvement if our default Slack message contains a more complete info about the monitors execution. :-)

vipulgupta2048 commented 5 years ago

I earlier mentioned the same on the mail. @rennerocha is right. We should focus on improving our default platforms such as Slack and Email actions. While the work on other platforms can also be done as I suggested. Improving defaults will help in the long run as users start to take up spidermon in their projects.

I can take up this feature request @ejulio, seems like a great place to start. Do let me know what do you have in mind of what a person sees when he/she receives a slack notification.

ejulio commented 5 years ago

One I have in mind right now is when it fails the item validation. It just says "Validation Error/No field error", for example. It would be nice to mention what errors did occur in the message.

vipulgupta2048 commented 5 years ago

Thanks, @ejulio Gotcha, let me set up the test environment. I will let you know of any follow-up questions that I have regarding the issue.

vipulgupta2048 commented 5 years ago

@ejulio I have created my own spider for Slack notifications as specified in the documentation and digging further to refer to the tutorial given here examples/tutorial. I have also read the extensive Slack documentation as well for making bots to see how the workflow is and also its limitations.

Questions

Thanks :hatching_chick:

Screenshot_2019-03-27_07-31-58

vipulgupta2048 commented 5 years ago

@rennerocha @ejulio Any comments? I have also mailed you of the approach I am using I have figured out the workflow for testing the changes, and also went through the code that runs to make this work. Any help to find where I could get the stats for the errors that happen on the terminal to be presented the right way in Slack?

rennerocha commented 5 years ago

@vipulgupta2048 The only documentation to configure Slack notifications is available here: https://spidermon.readthedocs.io/en/latest/actions.html#slack-action Improvements on the documentation are always welcome!

You are right on the place everything is located (spidermon/contrib/actions/slack). Probably for this issue, we need to update the jinja templates and maybe change the context (get_template_context method) to include more information.

About testing, we still don't have test for this action, so it would be good to have it.

ejulio commented 5 years ago

@vipulgupta2048 Sorry for the late response :disappointed: Regarding the data to be shown, you'll notice that everything is available in crawler.stats. About validation errors, they are added to items https://github.com/scrapinghub/spidermon/blob/master/spidermon/contrib/scrapy/pipelines.py#L144

But, you'll find the same info in crawler.stats as something like spidermon/validation/fields/errors/missing_required_field/src

kevinflo commented 3 years ago

I would especially love to see SPIDERMON_ADD_FIELD_COVERAGE able to be represented in the notifications. I just want to see at a glance if some field isn't getting captured properly or is only grabbing empty strings etc.

I currently actually can't find field coverage logged in scrapinghub anywhere either. The only time I can see field coverage information is when I run the spider locally

Unless I have it configured wrong? It looks like https://github.com/scrapinghub/spidermon/blob/5979f5cd963c3ff8c08fbd9925816f7ba5284568/spidermon/contrib/scrapy/extensions.py#L127 shows it should be added but no field coverage info is actually showing up in my scrapinghub job stats. Maybe since in my spider custom settings I'm doing

'SPIDERMON_SPIDER_CLOSE_MONITORS':
    'spidermon.contrib.scrapy.monitors.SpiderCloseMonitorSuite',
    'my_crawler1.monitors.SpiderCloseMonitorSuite',
),

could that be overwriting something that's supposed to trigger the actual spider_closed ?

ejulio commented 3 years ago

Interesting. Can you try?

class CustomSpidermon(Spidermon):
    def spider_closed(self, spider):
        super().spider_closed(spider)
        spider.logger.info(f"JSON Stats {json.dumps(self.crawler.stats)}")  # or any other logger

EXTENSIONS = {
    "CustomSpidermon": 0  # instead of spidermon
}

This should log the stats and make sure they've been updated. Maybe something is not running or scrapinghub is not updating them properly