simonw / strip-tags

CLI tool for stripping tags from HTML
Apache License 2.0
209 stars 6 forks source link

`TypeError: 'bool' object is not subscriptable` #15

Open umag opened 1 year ago

umag commented 1 year ago

Got that trace after upgrading from 0.3 (to 0.4.1) - worked fine

curl -s https://www.nytimes.com/ \
  | strip-tags .story-wrapper \
  | ttok -t 4000 \
  | llm --system 'summary bullet points'

Traceback (most recent call last):
  File "/usr/local/bin/strip-tags", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/strip_tags/cli.py", line 26, in cli
    final = strip_tags(
  File "/usr/local/lib/python3.9/site-packages/strip_tags/lib.py", line 115, in strip_tags
    soup = BeautifulSoup(input, "html5lib", multi_valued_attributes=False)
  File "/usr/local/lib/python3.9/site-packages/bs4/__init__.py", line 348, in __init__
    self._feed()
  File "/usr/local/lib/python3.9/site-packages/bs4/__init__.py", line 434, in _feed
    self.builder.feed(self.markup)
  File "/usr/local/lib/python3.9/site-packages/bs4/builder/_html5lib.py", line 87, in feed
    doc = parser.parse(markup, **extra_kwargs)
  File "/usr/local/lib/python3.9/site-packages/html5lib/html5parser.py", line 284, in parse
    self._parse(stream, False, None, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/html5lib/html5parser.py", line 133, in _parse
    self.mainLoop()
  File "/usr/local/lib/python3.9/site-packages/html5lib/html5parser.py", line 240, in mainLoop
    new_token = phase.processStartTag(new_token)
  File "/usr/local/lib/python3.9/site-packages/html5lib/html5parser.py", line 469, in processStartTag
    return func(token)
  File "/usr/local/lib/python3.9/site-packages/html5lib/html5parser.py", line 680, in startTagHtml
    return self.parser.phases["inBody"].processStartTag(token)
  File "/usr/local/lib/python3.9/site-packages/html5lib/html5parser.py", line 469, in processStartTag
    return func(token)
  File "/usr/local/lib/python3.9/site-packages/html5lib/html5parser.py", line 478, in startTagHtml
    self.tree.openElements[0].attributes[attr] = value
  File "/usr/local/lib/python3.9/site-packages/bs4/builder/_html5lib.py", line 246, in __setitem__
    if (name in list_attr['*']
TypeError: 'bool' object is not subscriptable
simonw commented 1 year ago

I can't replicate this error.

Looks like you're using Python 3.9 - I'll try that.

simonw commented 1 year ago

The stacktrace suggests that this is a problem in BeautifulSoup - it seems to have been triggered by this line:

    soup = BeautifulSoup(input, "html5lib", multi_valued_attributes=False)
simonw commented 1 year ago

I'll leave this issue open but it would be great if someone could grab HTML that triggers this error and share it in a Gist or similar - without steps to reproduce I can't investigate this further.

umag commented 1 year ago

Here is gist https://gist.github.com/umag/1b0c988109a0220b75c174a0993db0b5

mgalardini commented 1 year ago

Can confirm I had a similar error when using python 3.8 but not with python 3.11

  File "/home/marco/.local/lib/python3.8/site-packages/bs4/builder/_html5lib.py", line 252, in __setitem__
    if (name in list_attr.get('*')
TypeError: argument of type 'NoneType' is not iterable
n4cr commented 1 year ago

I had the same issue (with bs4 version 4.8) and upgraded BeautifulSoup to the latest version (4.12.2) and the problem was fixed

aborruso commented 1 year ago

I had the same issue (with bs4 version 4.8) and upgraded BeautifulSoup to the latest version (4.12.2) and the problem was fixed

the same for me. Thank you