passiomatic / coldsweat

Web RSS aggregator and reader compatible with the Fever API
MIT License
145 stars 21 forks source link

Exception handling for feed fetching #106

Closed jeroenh closed 7 years ago

jeroenh commented 7 years ago

The current feed fetching implementation does not catch exceptions. Over the past few days I've had a couple of socket.timeout and OpenSSL.SSL.SysCallError: (54, 'ECONNRESET')exceptions happening. Crontab happily emails me about this fact, but there is absolutely no context other than a stack trace. This means that it is impossible to pin down which feed is causing this. It would be great if there were a way to add this.

Looking at the code, the best place to catch the exception is probably in the update_feed() call at coldsweat/controllers.py:306. But that doesn't seem part of the stack trace. Line coldsweat/controllers.py:176 is an element, but at that point it is does not seem easy to track which feed was causing the exception.

Traceback (most recent call last):
 File "/usr/local/www/coldsweat/sweat.py", line 12, in <module>
   run()
 File "/usr/local/www/coldsweat/coldsweat/commands.py", line 252, in run
   cc.run_command(command_name, command_options, command_args)
 File "/usr/local/www/coldsweat/coldsweat/commands.py", line 57, in run_command
   handler(options, args)        
 File "/usr/local/www/coldsweat/coldsweat/commands.py", line 120, in command_refresh
   self.fetch_all_feeds()
 File "/usr/local/www/coldsweat/coldsweat/controllers.py", line 301, in fetch_all_feeds
   self.fetch_feeds(feeds)
 File "/usr/local/www/coldsweat/coldsweat/controllers.py", line 276, in fetch_feeds
   p.map(feed_worker, feeds)
 File "/usr/local/lib/python2.7/multiprocessing/pool.py", line 251, in map
   return self.map_async(func, iterable, chunksize).get()
 File "/usr/local/lib/python2.7/multiprocessing/pool.py", line 567, in get
   raise self._value
OpenSSL.SSL.SysCallError: (54, 'ECONNRESET')
passiomatic commented 7 years ago

Thank you for our report. HTTPS feed fetching is a recent addition for me since I didn't have an updated Python version on the production machine. I managed to update Python to version 2.7.12 and now I have a couple of feed served via HTTPS with no issues.

Back to your issue: fetcher code uses Requests library to handle connections which is very robust — it catches a lot of connection issues — and I wrap the fetch request around a RequestException and log it as network errors. You typically see these lines on the log:

[2016-08-19 00:00:14,915] 23346 WARNING a network error occurred while fetching bost.ocks.org, skipped

Looking at your stack trace: the feed worker calls fetcher.update_feed() so we are on the right track (I have now idea why it doesn't show up on the stack trace, though).

So the error originates from the SSL layer and should be wrapped by Requests. Of course googling for that exception gives us a myriad of bug reports, e.g.: https://github.com/DocNow/twarc/issues/72 I'll go thru the linked discussion to figure out what's going on.

The quick fix you could try is to catch everything on the https://github.com/passiomatic/coldsweat/blob/master/coldsweat/fetcher.py#L148 line. Doing this you should get an error log about the feed causing the connection error. When you have figured the offending feed please share it here, so I can do some testing it on my machine.

jeroenh commented 7 years ago

Okay, I added the exception handling in the right place, but since then the error has not occurred anymore.

I have a feeling the error may be caused by osx.iusethis.com for which I had a feed in ancient history and it just migrated along with my other feeds to coldsweat.

The feed entry is stuck in limbo now: it migrated, but has never worked since then. Now it doesn't show up in the feed overview, so I can't remove it.

passiomatic commented 7 years ago

In fact osx.iusethis.com seems dead (timeout) and that could cause the connection error you have seen.

In the meantime I've read the discussion previously linked to that OpenSSL exception and that boils down to the fact that Requests (or better urllib3) may use that - if installed - to handle SSL connections. Requests doesn't catch OpenSSL.SSL.SysCallError since it expects urllib3 to do that. I will be happy to update the version of Requests dependency once this thing will be fixed.

In Coldsweat makes little sense to catch that, because OpenSSL isn't even listed as project dependency.

The feed entry is stuck in limbo now: it migrated, but has never worked since then. Now it doesn't show up in the feed overview, so I can't remove it.

This is weird because even if the feed is disabled or has zero entries it should be assigned to your user and hence be displayed in the feed list panel. If it has no users assigned it is effectively in limbo waiting to some user to add it to her feeds. This seemed a smart thing to down at the time but it turned out to be counterproductive. There a plan to fix this, see #63.

passiomatic commented 7 years ago

I'm gonna close this since it was not reproducible by reporter anymore nor on my machine.