scrapinghub / extruct

Extract embedded metadata from HTML markup
BSD 3-Clause "New" or "Revised" License
846 stars 113 forks source link

" in application/ld+json gives exception #208

Open bodanius opened 1 year ago

bodanius commented 1 year ago

File "/usr/local/lib/python3.10/dist-packages/extruct/_extruct.py", line 131, in extract output[syntax] = list(extract(document, base_url=base_url)) File "/usr/local/lib/python3.10/dist-packages/extruct/jsonld.py", line 28, in extract_items return [ File "/usr/local/lib/python3.10/dist-packages/extruct/jsonld.py", line 28, in return [ File "/usr/local/lib/python3.10/dist-packages/extruct/jsonld.py", line 43, in _extract_items data = jstyleson.loads(HTML_OR_JS_COMMENTLINE.sub("", script), strict=False) File "/usr/local/lib/python3.10/dist-packages/jstyleson.py", line 123, in loads return json.loads(dispose(text), kwargs) File "/usr/lib/python3.10/json/init.py", line 359, in loads return cls(kw).decode(s) File "/usr/lib/python3.10/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python3.10/json/decoder.py", line 353, in raw_decode obj, end = self.scan_once(s, idx) json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

The JSON data on the website

Githubissues.
  • Githubissues is a development platform for aggregating issues.