pictuga / morss

Get full text RSS feeds
https://morss.it/
GNU Affero General Public License v3.0
621 stars 75 forks source link

Various feeds not working #62

Closed rebelga closed 3 years ago

rebelga commented 3 years ago

Link: https://www.firstthings.com/rss/web-exclusives

Output the full-text feed as RSS using the standard link of the >first< items and keep links

UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in position 374: character maps to <undefined>
      args = ('charmap', "%PDF-1.6\r%âãÏÓ\r\n440 0 obj\r<</Linearized 1/L 1072...\x00\x03\x00+–'�\r\nendstream\rendobj\rstartxref\r\n116\r\n%%EOF\r\n", 374, 375, 'character maps to <undefined>')
      encoding = 'charmap'
      end = 375
      object = "%PDF-1.6\r%âãÏÓ\r\n440 0 obj\r<</Linearized 1/L 1072...\x00\x03\x00+–'�\r\nendstream\rendobj\rstartxref\r\n116\r\n%%EOF\r\n"
      reason = 'character maps to <undefined>'
      start = 374
      with_traceback = <built-in method with_traceback of UnicodeEncodeError object>
pictuga commented 3 years ago

Thanks for reporting. I can't replicate the bug. Are you using python 2 or 3? Could you share a more complete debug (if available)?

rebelga commented 3 years ago

Python 3.8.10. I just built the image (no files changed).

ODDLY I just retried the URL and it worked. However, here are 3 that fail:

https://blog.ssdnodes.com/blog/feed/ https://wdtprs.com/feed/ https://www.reginamag.com/feed/

They also fail at morss.it with a "Couldn't load feed" message.

The full response I get follows:

Sat Sep 11 18:49:29 2021

A problem occurred in a Python script. Here is the sequence of function calls leading up to the error, in the order they occurred.

 /usr/lib/python3.8/site-packages/morss/wsgi.py in cgi_error_handler(environ={'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;...,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'HTTP_ACCEPT_ENCODING': 'gzip', 'HTTP_ACCEPT_LANGUAGE': 'en-US,en;q=0.9', 'HTTP_CDN_LOOP': 'cloudflare', 'HTTP_CF_CONNECTING_IP': '73.122.254.65', 'HTTP_CF_IPCOUNTRY': 'US', 'HTTP_CF_RAY': '68d313846b3a63d2-ATL', 'HTTP_CF_VISITOR': '{"scheme":"https"}', 'HTTP_CF_WARP_TAG_ID': '5ef82b78-cfdd-4962-96e4-f69523e6c920', 'HTTP_CONNECTION': 'keep-alive', ...}, start_response=<bound method Response.start_response of <gunicorn.http.wsgi.Response object>>, app=<function middleware.<locals>.app_builder.<locals>.app_wrap>)

    246 def cgi_error_handler(environ, start_response, app):
    247     try:
=>  248         return app(environ, start_response)
    249 
    250     except (KeyboardInterrupt, SystemExit):

app = <function middleware.<locals>.app_builder.<locals>.app_wrap>, environ = {'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;...,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'HTTP_ACCEPT_ENCODING': 'gzip', 'HTTP_ACCEPT_LANGUAGE': 'en-US,en;q=0.9', 'HTTP_CDN_LOOP': 'cloudflare', 'HTTP_CF_CONNECTING_IP': '73.122.254.65', 'HTTP_CF_IPCOUNTRY': 'US', 'HTTP_CF_RAY': '68d313846b3a63d2-ATL', 'HTTP_CF_VISITOR': '{"scheme":"https"}', 'HTTP_CF_WARP_TAG_ID': '5ef82b78-cfdd-4962-96e4-f69523e6c920', 'HTTP_CONNECTION': 'keep-alive', ...}, start_response = <bound method Response.start_response of <gunicorn.http.wsgi.Response object>>

 /usr/lib/python3.8/site-packages/morss/wsgi.py in app_wrap(environ={'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;...,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'HTTP_ACCEPT_ENCODING': 'gzip', 'HTTP_ACCEPT_LANGUAGE': 'en-US,en;q=0.9', 'HTTP_CDN_LOOP': 'cloudflare', 'HTTP_CF_CONNECTING_IP': '73.122.254.65', 'HTTP_CF_IPCOUNTRY': 'US', 'HTTP_CF_RAY': '68d313846b3a63d2-ATL', 'HTTP_CF_VISITOR': '{"scheme":"https"}', 'HTTP_CF_WARP_TAG_ID': '5ef82b78-cfdd-4962-96e4-f69523e6c920', 'HTTP_CONNECTION': 'keep-alive', ...}, start_response=<bound method Response.start_response of <gunicorn.http.wsgi.Response object>>)

    150             # This is called when a http request is being processed
    151 
=>  152             return func(environ, start_response, app)
    153 
    154         return app_wrap

func = <function cgi_dispatcher>, environ = {'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;...,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'HTTP_ACCEPT_ENCODING': 'gzip', 'HTTP_ACCEPT_LANGUAGE': 'en-US,en;q=0.9', 'HTTP_CDN_LOOP': 'cloudflare', 'HTTP_CF_CONNECTING_IP': '73.122.254.65', 'HTTP_CF_IPCOUNTRY': 'US', 'HTTP_CF_RAY': '68d313846b3a63d2-ATL', 'HTTP_CF_VISITOR': '{"scheme":"https"}', 'HTTP_CF_WARP_TAG_ID': '5ef82b78-cfdd-4962-96e4-f69523e6c920', 'HTTP_CONNECTION': 'keep-alive', ...}, start_response = <bound method Response.start_response of <gunicorn.http.wsgi.Response object>>, app = <function middleware.<locals>.app_builder.<locals>.app_wrap>

 /usr/lib/python3.8/site-packages/morss/wsgi.py in cgi_dispatcher(environ={'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;...,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'HTTP_ACCEPT_ENCODING': 'gzip', 'HTTP_ACCEPT_LANGUAGE': 'en-US,en;q=0.9', 'HTTP_CDN_LOOP': 'cloudflare', 'HTTP_CF_CONNECTING_IP': '73.122.254.65', 'HTTP_CF_IPCOUNTRY': 'US', 'HTTP_CF_RAY': '68d313846b3a63d2-ATL', 'HTTP_CF_VISITOR': '{"scheme":"https"}', 'HTTP_CF_WARP_TAG_ID': '5ef82b78-cfdd-4962-96e4-f69523e6c920', 'HTTP_CONNECTION': 'keep-alive', ...}, start_response=<bound method Response.start_response of <gunicorn.http.wsgi.Response object>>, app=<function middleware.<locals>.app_builder.<locals>.app_wrap>)

    240             return dispatch_table[key](environ, start_response)
    241 
=>  242     return app(environ, start_response)
    243 
    244 

app = <function middleware.<locals>.app_builder.<locals>.app_wrap>, environ = {'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;...,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'HTTP_ACCEPT_ENCODING': 'gzip', 'HTTP_ACCEPT_LANGUAGE': 'en-US,en;q=0.9', 'HTTP_CDN_LOOP': 'cloudflare', 'HTTP_CF_CONNECTING_IP': '73.122.254.65', 'HTTP_CF_IPCOUNTRY': 'US', 'HTTP_CF_RAY': '68d313846b3a63d2-ATL', 'HTTP_CF_VISITOR': '{"scheme":"https"}', 'HTTP_CF_WARP_TAG_ID': '5ef82b78-cfdd-4962-96e4-f69523e6c920', 'HTTP_CONNECTION': 'keep-alive', ...}, start_response = <bound method Response.start_response of <gunicorn.http.wsgi.Response object>>

 /usr/lib/python3.8/site-packages/morss/wsgi.py in app_wrap(environ={'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;...,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'HTTP_ACCEPT_ENCODING': 'gzip', 'HTTP_ACCEPT_LANGUAGE': 'en-US,en;q=0.9', 'HTTP_CDN_LOOP': 'cloudflare', 'HTTP_CF_CONNECTING_IP': '73.122.254.65', 'HTTP_CF_IPCOUNTRY': 'US', 'HTTP_CF_RAY': '68d313846b3a63d2-ATL', 'HTTP_CF_VISITOR': '{"scheme":"https"}', 'HTTP_CF_WARP_TAG_ID': '5ef82b78-cfdd-4962-96e4-f69523e6c920', 'HTTP_CONNECTION': 'keep-alive', ...}, start_response=<bound method Response.start_response of <gunicorn.http.wsgi.Response object>>)

    150             # This is called when a http request is being processed
    151 
=>  152             return func(environ, start_response, app)
    153 
    154         return app_wrap

func = <function cgi_file_handler>, environ = {'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;...,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'HTTP_ACCEPT_ENCODING': 'gzip', 'HTTP_ACCEPT_LANGUAGE': 'en-US,en;q=0.9', 'HTTP_CDN_LOOP': 'cloudflare', 'HTTP_CF_CONNECTING_IP': '73.122.254.65', 'HTTP_CF_IPCOUNTRY': 'US', 'HTTP_CF_RAY': '68d313846b3a63d2-ATL', 'HTTP_CF_VISITOR': '{"scheme":"https"}', 'HTTP_CF_WARP_TAG_ID': '5ef82b78-cfdd-4962-96e4-f69523e6c920', 'HTTP_CONNECTION': 'keep-alive', ...}, start_response = <bound method Response.start_response of <gunicorn.http.wsgi.Response object>>, app = <function cgi_app>

 /usr/lib/python3.8/site-packages/morss/wsgi.py in cgi_file_handler(environ={'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;...,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'HTTP_ACCEPT_ENCODING': 'gzip', 'HTTP_ACCEPT_LANGUAGE': 'en-US,en;q=0.9', 'HTTP_CDN_LOOP': 'cloudflare', 'HTTP_CF_CONNECTING_IP': '73.122.254.65', 'HTTP_CF_IPCOUNTRY': 'US', 'HTTP_CF_RAY': '68d313846b3a63d2-ATL', 'HTTP_CF_VISITOR': '{"scheme":"https"}', 'HTTP_CF_WARP_TAG_ID': '5ef82b78-cfdd-4962-96e4-f69523e6c920', 'HTTP_CONNECTION': 'keep-alive', ...}, start_response=<bound method Response.start_response of <gunicorn.http.wsgi.Response object>>, app=<function cgi_app>)

    190 
    191     # regex didn't validate or no file found
=>  192     return app(environ, start_response)
    193 
    194 

app = <function cgi_app>, environ = {'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;...,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'HTTP_ACCEPT_ENCODING': 'gzip', 'HTTP_ACCEPT_LANGUAGE': 'en-US,en;q=0.9', 'HTTP_CDN_LOOP': 'cloudflare', 'HTTP_CF_CONNECTING_IP': '73.122.254.65', 'HTTP_CF_IPCOUNTRY': 'US', 'HTTP_CF_RAY': '68d313846b3a63d2-ATL', 'HTTP_CF_VISITOR': '{"scheme":"https"}', 'HTTP_CF_WARP_TAG_ID': '5ef82b78-cfdd-4962-96e4-f69523e6c920', 'HTTP_CONNECTION': 'keep-alive', ...}, start_response = <bound method Response.start_response of <gunicorn.http.wsgi.Response object>>

 /usr/lib/python3.8/site-packages/morss/wsgi.py in cgi_app(environ={'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;...,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'HTTP_ACCEPT_ENCODING': 'gzip', 'HTTP_ACCEPT_LANGUAGE': 'en-US,en;q=0.9', 'HTTP_CDN_LOOP': 'cloudflare', 'HTTP_CF_CONNECTING_IP': '73.122.254.65', 'HTTP_CF_IPCOUNTRY': 'US', 'HTTP_CF_RAY': '68d313846b3a63d2-ATL', 'HTTP_CF_VISITOR': '{"scheme":"https"}', 'HTTP_CF_WARP_TAG_ID': '5ef82b78-cfdd-4962-96e4-f69523e6c920', 'HTTP_CONNECTION': 'keep-alive', ...}, start_response=<bound method Response.start_response of <gunicorn.http.wsgi.Response object>>)

    126 
    127     # get the work done
=>  128     url, rss = FeedFetch(url, options)
    129 
    130     start_response(headers['status'], list(headers.items()))

url = 'blog.ssdnodes.com/blog/feed/', rss undefined, global FeedFetch = <function FeedFetch>, options = <morss.morss.Options object>

 /usr/lib/python3.8/site-packages/morss/morss.py in FeedFetch(url='blog.ssdnodes.com/blog/feed/', options=<morss.morss.Options object>)

    275 
    276     except (IOError, HTTPException):
=>  277         raise MorssException('Error downloading feed')
    278 
    279     if options.items:

global MorssException = <class 'morss.morss.MorssException'>

MorssException: Error downloading feed
      args = ('Error downloading feed',)
      with_traceback = <built-in method with_traceback of MorssException object>

Other than these feeds, it works really well.

pictuga commented 3 years ago

I've been making some dvplt these last few days, so morss.it/git versions were sometimes buggy

rebelga commented 3 years ago

Thank you.