Open ejulio opened 5 years ago
I believe we need to decide if we are keeping features like JSON Schema Validation and Slack Notification on the project's core (#88, for example). Depending on what's decided, we may not invest time improving this code, at least not here in this repository.
@victor-torres I don't see problems creating integration with different systems (like Slack or Amazon SES E-mail, or Amazon Storage), but this couldn't be considered as the core of spidermon .
But, if some user decides to use one of these optional integration, it doesn't hurt if the defaults are more complete, so I believe it would be a good improvement if our default Slack message contains a more complete info about the monitors execution. :-)
I earlier mentioned the same on the mail. @rennerocha is right. We should focus on improving our default platforms such as Slack and Email actions. While the work on other platforms can also be done as I suggested. Improving defaults will help in the long run as users start to take up spidermon in their projects.
I can take up this feature request @ejulio, seems like a great place to start. Do let me know what do you have in mind of what a person sees when he/she receives a slack notification.
One I have in mind right now is when it fails the item validation. It just says "Validation Error/No field error", for example. It would be nice to mention what errors did occur in the message.
Thanks, @ejulio Gotcha, let me set up the test environment. I will let you know of any follow-up questions that I have regarding the issue.
@ejulio I have created my own spider for Slack notifications as specified in the documentation and digging further to refer to the tutorial given here examples/tutorial
. I have also read the extensive Slack documentation as well for making bots to see how the workflow is and also its limitations.
Questions
settings.py
. Should I improve on the same?spidermon/contrib/actions/slack
. Am I on the right path? If I do make changes to the notifiers.py
script, how should I go about testing the changes? Thanks :hatching_chick:
@rennerocha @ejulio Any comments? I have also mailed you of the approach I am using I have figured out the workflow for testing the changes, and also went through the code that runs to make this work. Any help to find where I could get the stats for the errors that happen on the terminal to be presented the right way in Slack?
@vipulgupta2048 The only documentation to configure Slack notifications is available here: https://spidermon.readthedocs.io/en/latest/actions.html#slack-action Improvements on the documentation are always welcome!
You are right on the place everything is located (spidermon/contrib/actions/slack
). Probably for this issue, we need to update the jinja templates and maybe change the context (get_template_context
method) to include more information.
About testing, we still don't have test for this action, so it would be good to have it.
@vipulgupta2048
Sorry for the late response :disappointed:
Regarding the data to be shown, you'll notice that everything is available in crawler.stats
.
About validation errors, they are added to items https://github.com/scrapinghub/spidermon/blob/master/spidermon/contrib/scrapy/pipelines.py#L144
But, you'll find the same info in crawler.stats
as something like spidermon/validation/fields/errors/missing_required_field/src
I would especially love to see SPIDERMON_ADD_FIELD_COVERAGE
able to be represented in the notifications. I just want to see at a glance if some field isn't getting captured properly or is only grabbing empty strings etc.
I currently actually can't find field coverage logged in scrapinghub anywhere either. The only time I can see field coverage information is when I run the spider locally
Unless I have it configured wrong? It looks like https://github.com/scrapinghub/spidermon/blob/5979f5cd963c3ff8c08fbd9925816f7ba5284568/spidermon/contrib/scrapy/extensions.py#L127 shows it should be added but no field coverage info is actually showing up in my scrapinghub job stats. Maybe since in my spider custom settings I'm doing
'SPIDERMON_SPIDER_CLOSE_MONITORS':
'spidermon.contrib.scrapy.monitors.SpiderCloseMonitorSuite',
'my_crawler1.monitors.SpiderCloseMonitorSuite',
),
could that be overwriting something that's supposed to trigger the actual spider_closed
?
Interesting. Can you try?
class CustomSpidermon(Spidermon):
def spider_closed(self, spider):
super().spider_closed(spider)
spider.logger.info(f"JSON Stats {json.dumps(self.crawler.stats)}") # or any other logger
EXTENSIONS = {
"CustomSpidermon": 0 # instead of spidermon
}
This should log the stats and make sure they've been updated. Maybe something is not running or scrapinghub is not updating them properly
Currently the notifications we receive in Slack are quite simple, without much information. For example:
We could add more information there, something like: