Traceback (most recent call last):
File "/usr/bin/zimit", line 541, in <module>
zimit()
File "/usr/bin/zimit", line 443, in zimit
return warc2zim(warc2zim_args)
File "/app/zimit/lib/python3.10/site-packages/warc2zim/main.py", line 811, in warc2zim
return warc2zim.run()
File "/app/zimit/lib/python3.10/site-packages/warc2zim/main.py", line 433, in run
self.add_items_for_warc_record(record)
File "/app/zimit/lib/python3.10/site-packages/warc2zim/main.py", line 646, in add_items_for_warc_record
payload_item = WARCPayloadItem(record, self.head_insert, self.css_insert)
File "/app/zimit/lib/python3.10/site-packages/warc2zim/main.py", line 179, in __init__
self.title = parse_title(self.content)
File "/app/zimit/lib/python3.10/site-packages/warc2zim/main.py", line 714, in parse_title
soup = BeautifulSoup(content, "html.parser")
File "/app/zimit/lib/python3.10/site-packages/bs4/__init__.py", line 348, in __init__
self._feed()
File "/app/zimit/lib/python3.10/site-packages/bs4/__init__.py", line 434, in _feed
self.builder.feed(self.markup)
File "/app/zimit/lib/python3.10/site-packages/bs4/builder/_htmlparser.py", line 377, in feed
parser.feed(markup)
File "/usr/lib/python3.10/html/parser.py", line 110, in feed
self.goahead(0)
File "/usr/lib/python3.10/html/parser.py", line 178, in goahead
k = self.parse_html_declaration(i)
File "/usr/lib/python3.10/html/parser.py", line 263, in parse_html_declaration
return self.parse_marked_section(i)
File "/usr/lib/python3.10/_markupbase.py", line 144, in parse_marked_section
sectName, j = self._scan_name( i+3, i )
File "/usr/lib/python3.10/_markupbase.py", line 390, in _scan_name
raise AssertionError(
AssertionError: expected name token at '<![\x05�\x069�y�\x00"���@��\x11H'
FATAL: exception not rethrown
Before that, we have many times in the log:
[WARNING] Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
youzim.it run of https://archives.nyphil.org/ failed reporting lots of unrecognized chars.
Task is here.
Command used:
Final error:
Before that, we have many times in the log: