scrapinghub / extruct

Extract embedded metadata from HTML markup
BSD 3-Clause "New" or "Revised" License
849 stars 113 forks source link

JSON-LD: Use UltraJSON if available #49

Open redapple opened 7 years ago

redapple commented 7 years ago

Motivation: https://github.com/scrapinghub/extruct/issues/37#issuecomment-284255022

petri commented 2 years ago

If you were to use something else than the built-in json for parsing, you'd probably want to also change https://github.com/scrapinghub/extruct/blob/master/extruct/jsonld.py#L38 that uses built-in json behind the scenes.

You could use the comment-removal dispose function from jstyleson directly for that.

Or, use https://pypi.org/project/nojsoncomments/ that uses the exact same logic for comment removal but is over twice as fast. The use case is somewhat marginal of course, so the typical real benefit is not big despite speed increase and it's a Cython extension so someone would have to provide Windows & Linux wheels (I'm on a Mac).