pictuga / morss

Get full text RSS feeds
https://morss.it/
GNU Affero General Public License v3.0
593 stars 77 forks source link

Error with soup #95

Open Binnette opened 2 years ago

Binnette commented 2 years ago

Hello,

I am running the command PORT=9000 morss on my server. And I added the folowing rules in NGINX (I am using YunoHost):

location /morss/ {
  proxy_pass        http://127.0.0.1:9000/;
  proxy_redirect    off;
  proxy_set_header  Host $host;
  proxy_set_header  X-Real-IP $remote_addr;
  proxy_set_header  X-Forwarded-Proto $scheme;
  proxy_set_header  X-Forwarded-For $proxy_add_x_forwarded_for;
  proxy_set_header  X-Forwarded-Host $server_name;
  proxy_set_header  X-Forwarded-Port $server_port;

  proxy_http_version 1.1;
  proxy_set_header Upgrade $http_upgrade;
  proxy_set_header Connection "upgrade";

  # Include SSOWAT user panel.
  include conf.d/yunohost_panel.conf.inc;
  more_clear_input_headers 'Accept-Encoding';
}

I get the main page to work on https://server.example.com/morss/ image

So I copy/paste the feed url https://positivr.fr/feed/ Press enter, I am correctly redirected to: https://server.example.com/morss/positivr.fr/feed/ But I get this error image

Here are my logs:

root@server:/# PORT=9000 morss
Serving http://localhost:9000/
127.0.0.1 - - [15/Apr/2022 11:01:16] "GET / HTTP/1.1" 200 1154
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/morss/wsgi.py", line 248, in cgi_error_handler
    return app(environ, start_response)
  File "/usr/local/lib/python3.7/dist-packages/morss/wsgi.py", line 153, in app_wrap
    return func(environ, start_response, app)
  File "/usr/local/lib/python3.7/dist-packages/morss/wsgi.py", line 242, in cgi_dispatcher
    return app(environ, start_response)
  File "/usr/local/lib/python3.7/dist-packages/morss/wsgi.py", line 153, in app_wrap
    return func(environ, start_response, app)
  File "/usr/local/lib/python3.7/dist-packages/morss/wsgi.py", line 188, in cgi_file_handler
    return app(environ, start_response)
  File "/usr/local/lib/python3.7/dist-packages/morss/wsgi.py", line 133, in cgi_app
    rss = FeedGather(rss, url, options)
  File "/usr/local/lib/python3.7/dist-packages/morss/morss.py", line 370, in FeedGather
    ItemFill(item, options, url)
  File "/usr/local/lib/python3.7/dist-packages/morss/morss.py", line 225, in ItemFill
    out = readabilite.get_article(req['data'], url=req['url'], encoding_in=req['encoding'], encoding_out='unicode', xpath=options.xpath)
  File "/usr/local/lib/python3.7/dist-packages/morss/readabilite.py", line 346, in get_article
    html = parse(data, encoding_in)
  File "/usr/local/lib/python3.7/dist-packages/morss/readabilite.py", line 33, in parse
    return lxml.html.soupparser.fromstring(data, builder=CustomTreeBuilder, **kwargs)
  File "/usr/lib/python3/dist-packages/lxml/html/soupparser.py", line 33, in fromstring
    return _parse(data, beautifulsoup, makeelement, **bsargs)
  File "/usr/lib/python3/dist-packages/lxml/html/soupparser.py", line 78, in _parse
    tree = beautifulsoup(source, **bsargs)
  File "/usr/lib/python3/dist-packages/bs4/__init__.py", line 241, in __init__
    self.builder.initialize_soup(self)
TypeError: initialize_soup() missing 1 required positional argument: 'soup'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.7/wsgiref/handlers.py", line 137, in run
    self.result = application(self.environ, self.start_response)
  File "/usr/local/lib/python3.7/dist-packages/morss/wsgi.py", line 153, in app_wrap
    return func(environ, start_response, app)
  File "/usr/local/lib/python3.7/dist-packages/morss/wsgi.py", line 262, in cgi_encode
    out = app(environ, start_response)
  File "/usr/local/lib/python3.7/dist-packages/morss/wsgi.py", line 153, in app_wrap
    return func(environ, start_response, app)
  File "/usr/local/lib/python3.7/dist-packages/morss/wsgi.py", line 257, in cgi_error_handler
    return [cgitb.html(sys.exc_info())]
  File "/usr/lib/python3.7/cgitb.py", line 129, in html
    formatvalue=lambda value: '=' + pydoc.html.repr(value))
  File "/usr/lib/python3.7/inspect.py", line 1281, in formatargvalues
    specs.append(convert(args[i]))
  File "/usr/lib/python3.7/inspect.py", line 1278, in convert
    return formatarg(name) + formatvalue(locals[name])
  File "/usr/lib/python3.7/cgitb.py", line 129, in <lambda>
    formatvalue=lambda value: '=' + pydoc.html.repr(value))
  File "/usr/lib/python3.7/pydoc.py", line 448, in repr
    return Repr.repr(self, object)
  File "/usr/lib/python3.7/reprlib.py", line 52, in repr
    return self.repr1(x, self.maxlevel)
  File "/usr/lib/python3.7/pydoc.py", line 455, in repr1
    return self.escape(cram(stripid(repr(x)), self.maxother))
  File "/usr/lib/python3/dist-packages/bs4/element.py", line 1099, in __unicode__
    return self.decode()
  File "/usr/lib/python3/dist-packages/bs4/__init__.py", line 566, in decode
    indent_level, eventual_encoding, formatter)
  File "/usr/lib/python3/dist-packages/bs4/element.py", line 1144, in decode
    if self.attrs:
  File "/usr/lib/python3/dist-packages/bs4/element.py", line 1061, in __getattr__
    return self.find(tag)
  File "/usr/lib/python3/dist-packages/bs4/element.py", line 1300, in find
    l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
  File "/usr/lib/python3/dist-packages/bs4/element.py", line 1321, in find_all
    return self._find_all(name, attrs, text, limit, generator, **kwargs)
  File "/usr/lib/python3/dist-packages/bs4/element.py", line 633, in _find_all
    i = next(generator)
  File "/usr/lib/python3/dist-packages/bs4/element.py", line 1333, in descendants
    if not len(self.contents):
  File "/usr/lib/python3/dist-packages/bs4/element.py", line 1063, in __getattr__
    "'%s' object has no attribute '%s'" % (self.__class__, tag))
AttributeError: '<class 'bs4.BeautifulSoup'>' object has no attribute 'contents'
127.0.0.1 - - [15/Apr/2022 11:02:27] "GET /positivr.fr/feed/ HTTP/1.1" 500 59

Note that this feed work nicely when I run the command morss on my computer (instead of my server) image

I tried with gunicorn and I get the same errors:

root@server:/# PORT=9000 gunicorn --preload morss
[2022-04-15 11:09:30 +0000] [24619] [INFO] Starting gunicorn 20.1.0
[2022-04-15 11:09:30 +0000] [24619] [INFO] Listening at: http://0.0.0.0:9000 (24619)
[2022-04-15 11:09:30 +0000] [24619] [INFO] Using worker: sync
[2022-04-15 11:09:30 +0000] [24623] [INFO] Booting worker with pid: 24623
[2022-04-15 11:09:41 +0000] [24623] [ERROR] Error handling request /positivr.fr/feed/
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/morss/wsgi.py", line 248, in cgi_error_handler
    return app(environ, start_response)
  File "/usr/local/lib/python3.7/dist-packages/morss/wsgi.py", line 153, in app_wrap
    return func(environ, start_response, app)
  File "/usr/local/lib/python3.7/dist-packages/morss/wsgi.py", line 242, in cgi_dispatcher
    return app(environ, start_response)
  File "/usr/local/lib/python3.7/dist-packages/morss/wsgi.py", line 153, in app_wrap
    return func(environ, start_response, app)
  File "/usr/local/lib/python3.7/dist-packages/morss/wsgi.py", line 188, in cgi_file_handler
    return app(environ, start_response)
  File "/usr/local/lib/python3.7/dist-packages/morss/wsgi.py", line 133, in cgi_app
    rss = FeedGather(rss, url, options)
  File "/usr/local/lib/python3.7/dist-packages/morss/morss.py", line 370, in FeedGather
    ItemFill(item, options, url)
  File "/usr/local/lib/python3.7/dist-packages/morss/morss.py", line 225, in ItemFill
    out = readabilite.get_article(req['data'], url=req['url'], encoding_in=req['encoding'], encoding_out='unicode', xpath=options.xpath)
  File "/usr/local/lib/python3.7/dist-packages/morss/readabilite.py", line 346, in get_article
    html = parse(data, encoding_in)
  File "/usr/local/lib/python3.7/dist-packages/morss/readabilite.py", line 33, in parse
    return lxml.html.soupparser.fromstring(data, builder=CustomTreeBuilder, **kwargs)
  File "/usr/lib/python3/dist-packages/lxml/html/soupparser.py", line 33, in fromstring
    return _parse(data, beautifulsoup, makeelement, **bsargs)
  File "/usr/lib/python3/dist-packages/lxml/html/soupparser.py", line 78, in _parse
    tree = beautifulsoup(source, **bsargs)
  File "/usr/lib/python3/dist-packages/bs4/__init__.py", line 241, in __init__
    self.builder.initialize_soup(self)
TypeError: initialize_soup() missing 1 required positional argument: 'soup'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/gunicorn/workers/sync.py", line 136, in handle
    self.handle_request(listener, req, client, addr)
  File "/usr/local/lib/python3.7/dist-packages/gunicorn/workers/sync.py", line 179, in handle_request
    respiter = self.wsgi(environ, resp.start_response)
  File "/usr/local/lib/python3.7/dist-packages/morss/wsgi.py", line 153, in app_wrap
    return func(environ, start_response, app)
  File "/usr/local/lib/python3.7/dist-packages/morss/wsgi.py", line 262, in cgi_encode
    out = app(environ, start_response)
  File "/usr/local/lib/python3.7/dist-packages/morss/wsgi.py", line 153, in app_wrap
    return func(environ, start_response, app)
  File "/usr/local/lib/python3.7/dist-packages/morss/wsgi.py", line 257, in cgi_error_handler
    return [cgitb.html(sys.exc_info())]
  File "/usr/lib/python3.7/cgitb.py", line 129, in html
    formatvalue=lambda value: '=' + pydoc.html.repr(value))
  File "/usr/lib/python3.7/inspect.py", line 1281, in formatargvalues
    specs.append(convert(args[i]))
  File "/usr/lib/python3.7/inspect.py", line 1278, in convert
    return formatarg(name) + formatvalue(locals[name])
  File "/usr/lib/python3.7/cgitb.py", line 129, in <lambda>
    formatvalue=lambda value: '=' + pydoc.html.repr(value))
  File "/usr/lib/python3.7/pydoc.py", line 448, in repr
    return Repr.repr(self, object)
  File "/usr/lib/python3.7/reprlib.py", line 52, in repr
    return self.repr1(x, self.maxlevel)
  File "/usr/lib/python3.7/pydoc.py", line 455, in repr1
    return self.escape(cram(stripid(repr(x)), self.maxother))
  File "/usr/lib/python3/dist-packages/bs4/element.py", line 1099, in __unicode__
    return self.decode()
  File "/usr/lib/python3/dist-packages/bs4/__init__.py", line 566, in decode
    indent_level, eventual_encoding, formatter)
  File "/usr/lib/python3/dist-packages/bs4/element.py", line 1144, in decode
    if self.attrs:
  File "/usr/lib/python3/dist-packages/bs4/element.py", line 1061, in __getattr__
    return self.find(tag)
  File "/usr/lib/python3/dist-packages/bs4/element.py", line 1300, in find
    l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
  File "/usr/lib/python3/dist-packages/bs4/element.py", line 1321, in find_all
    return self._find_all(name, attrs, text, limit, generator, **kwargs)
  File "/usr/lib/python3/dist-packages/bs4/element.py", line 633, in _find_all
    i = next(generator)
  File "/usr/lib/python3/dist-packages/bs4/element.py", line 1333, in descendants
    if not len(self.contents):
  File "/usr/lib/python3/dist-packages/bs4/element.py", line 1063, in __getattr__
    "'%s' object has no attribute '%s'" % (self.__class__, tag))
AttributeError: '<class 'bs4.BeautifulSoup'>' object has no attribute 'contents'

Thank for helping me. Let me know if you want more diagnostics.

pictuga commented 2 years ago

Hello! Do you know which version of the "bs4" (beautifulsoup) python library is installed on yunohost?

Binnette commented 2 years ago

Hi @pictuga, thank for your answer I installed morss manually by following morss README.

I installed morss via pip install git+https://git.pictuga.com/pictuga/morss.git#egg=morss[full]

So my version of soup is:

$ pip show beautifulsoup4
Name: beautifulsoup4
Version: 4.7.1
Summary: Screen-scraping library
Home-page: http://www.crummy.com/software/BeautifulSoup/bs4/
Author: Leonard Richardson
Author-email: leonardr@segfault.org
License: MIT
Location: /usr/lib/python3/dist-packages
Requires: 
Required-by: bs4

And bs4 version:

pip show bs4
Name: bs4
Version: 0.0.1
Summary: Screen-scraping library
Home-page: https://pypi.python.org/pypi/beautifulsoup4
Author: Leonard Richardson
Author-email: leonardr@segfault.org
License: MIT
Location: /usr/local/lib/python3.7/dist-packages
Requires: beautifulsoup4
Required-by: morss

But I can install another version via pip. Just let me know 👍

iNtEgraIR2021 commented 1 year ago

Hi @Binnette

I just successfully tested https://morss.it/https://positivr.fr/feed/. Could you please verify if that bug still occurs?

Regards, Petra

pictuga commented 1 year ago

Sorry for the late reply. Have you tried installing html5lib? (got the idea from here)