scrapinghub / spidermon

Scrapy Extension for monitoring spiders execution.
https://spidermon.readthedocs.io
BSD 3-Clause "New" or "Revised" License
524 stars 94 forks source link

Error with SendSmtpEmail: 'NoneType' object has no attribute 'bio_read' #412

Open drmaniak opened 11 months ago

drmaniak commented 11 months ago

Description

When using the SendSmtpEmail class from Spidermon, even though the email is sent successfully, an error is thrown by the Twisted framework. I followed the syntax from the latest Spidermon docs.

Code

class SpiderCloseMonitorSuite(MonitorSuite):

    monitors = [
        SpiderHealthMonitor
    ]

    monitors_failed_actions = [
        SendSmtpEmail
    ]

Spidermon Email configuration (in settings.py)

# Email Settings
SPIDERMON_EMAIL_SENDER = <sender_email>
SPIDERMON_EMAIL_SUBJECT = "Scraper Failure"
SPIDERMON_EMAIL_TO = [<receiver_emails>]

# SMTP Settings
SPIDERMON_SMTP_HOST = <my_mail_server>
SPIDERMON_SMTP_PORT = 587 
SPIDERMON_SMTP_USER = <sender_username/email>
SPIDERMON_SMTP_PASSWORD = <pass>
SPIDERMON_SMTP_ENFORCE_SSL = False
SPIDERMON_SMTP_ENFORCE_TLS = True

Error Logs

2023-08-16 15:20:06 [scrapy.mail] INFO: Mail sent OK: To=[<my_email>] Cc=[] Subject="Scraper Failure" Attachs=0

Traceback (most recent call last):
  File "/home/x/anaconda3/envs/dmd/lib/python3.9/site-packages/twisted/python/log.py", line 96, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
  File "/home/x/anaconda3/envs/dmd/lib/python3.9/site-packages/twisted/python/log.py", line 80, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
  File "/home/x/anaconda3/envs/dmd/lib/python3.9/site-packages/twisted/python/context.py", line 117, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/home/x/anaconda3/envs/dmd/lib/python3.9/site-packages/twisted/python/context.py", line 82, in callWithContext
    return func(*args, **kw)
--- <exception caught here> ---
  File "/home/x/anaconda3/envs/dmd/lib/python3.9/site-packages/twisted/internet/asyncioreactor.py", line 138, in _readOrWrite
    why = method()
  File "/home/x/anaconda3/envs/dmd/lib/python3.9/site-packages/twisted/internet/tcp.py", line 248, in doRead
    return self._dataReceived(data)
  File "/home/x/anaconda3/envs/dmd/lib/python3.9/site-packages/twisted/internet/tcp.py", line 253, in _dataReceived
    rval = self.protocol.dataReceived(data)
  File "/home/x/anaconda3/envs/dmd/lib/python3.9/site-packages/twisted/protocols/tls.py", line 329, in dataReceived
    self._flushReceiveBIO()
  File "/home/x/anaconda3/envs/dmd/lib/python3.9/site-packages/twisted/protocols/tls.py", line 300, in _flushReceiveBIO
    self._flushSendBIO()
  File "/home/x/anaconda3/envs/dmd/lib/python3.9/site-packages/twisted/protocols/tls.py", line 253, in _flushSendBIO
    bytes = self._tlsConnection.bio_read(2 ** 15)
builtins.AttributeError: 'NoneType' object has no attribute 'bio_read'

2023-08-16 15:20:06 [twisted] CRITICAL: 
Traceback (most recent call last):
  File "/home/x/anaconda3/envs/dmd/lib/python3.9/site-packages/twisted/internet/asyncioreactor.py", line 138, in _readOrWrite
    why = method()
  File "/home/x/anaconda3/envs/dmd/lib/python3.9/site-packages/twisted/internet/tcp.py", line 248, in doRead
    return self._dataReceived(data)
  File "/home/x/anaconda3/envs/dmd/lib/python3.9/site-packages/twisted/internet/tcp.py", line 253, in _dataReceived
    rval = self.protocol.dataReceived(data)
  File "/home/x/anaconda3/envs/dmd/lib/python3.9/site-packages/twisted/protocols/tls.py", line 329, in dataReceived
    self._flushReceiveBIO()
  File "/home/x/anaconda3/envs/dmd/lib/python3.9/site-packages/twisted/protocols/tls.py", line 300, in _flushReceiveBIO
    self._flushSendBIO()
  File "/home/x/anaconda3/envs/dmd/lib/python3.9/site-packages/twisted/protocols/tls.py", line 253, in _flushSendBIO
    bytes = self._tlsConnection.bio_read(2 ** 15)
AttributeError: 'NoneType' object has no attribute 'bio_read'

Environment

Scrapy Version: 2.10.0 Spidermon Version: 1.19.0 Twisted Version: 22.10.0 Python Version: 3.9 OS: WSL2 (Ubuntu 22.04)

VMRuiz commented 10 months ago

Seems related to https://github.com/scrapy/scrapy/issues/3478

VMRuiz commented 10 months ago

Spidermon uses scrapy.mail.EmailSender to send emails using SMTP protocol. Such class returns a Deferred object that is never awaited and caused the error logs that you are raising. To fix this we either need to:

cc. @Gallaecio @wRAR @rennerocha

wRAR commented 10 months ago

The analysis makes sense. Blocking until the deferred finishes should be fine if the code allows doing it easily.

VMRuiz commented 10 months ago

It doesn't look trivial to block on a deferred or rebuild spidermon to support async code, so we should change the current action to use some other smtp library like https://docs.python.org/3/library/smtplib.html instead.

rennerocha commented 9 months ago

Rebuild spidermon to support async code would be a Spidermon 2.0 change. I don't think we are considering this in short term :smile:

Using blocking Python stlib smtplib for that looks fine to me, and not complicated to do.