j0k3r / graby

Graby helps you extract article content from web pages
MIT License
365 stars 74 forks source link

Add `VideoGame` JsonLD type to exceptions #244

Closed zyuhel closed 2 years ago

zyuhel commented 3 years ago

Some resources like Gamespot.com have VideoGame JsonLD objects that available on the page and because they are last, they rewrite correct datatime with incorrect one.

Example: https://www.gamespot.com/articles/nvidia-rtx-3090-3080-3070-gpu-all-specs-prices-and-release-dates-detailed/1100-6481673/

p.s Maybe there is a good idea to make this list customizable?

coveralls commented 2 years ago

Coverage Status

Coverage increased (+0.006%) to 95.042% when pulling 3e0cfd59814608577fba38549a027dbbaed50d6f on zyuhel:patch-1 into d395eebb7c75ddaa379d3a48d00c0afad6816b34 on j0k3r:master.

j0k3r commented 2 years ago

Thanks @zyuhel! Sorry for the year of delay 😅 I've updated your PR and added an option to ContentExtractor called json_ld_ignore_types.