postlight / parser

📜 Extract meaningful content from the chaos of a web page
https://reader.postlight.com
Apache License 2.0
5.46k stars 446 forks source link

Lemonde.fr headings and "Décryptages" are missing in parsed content #510

Open kit-cat opened 5 years ago

kit-cat commented 5 years ago

Expected Behavior

Expect all headings and "Décryptages" within Le Monde articles to be present within parsed content.

Current Behavior

Headings and "Décryptages" missing from the parsed content.

Steps to Reproduce

Use the following Le Monde article: https://www.lemonde.fr/pixels/article/2019/11/08/e-sport-tout-comprendre-a-league-of-legends-dont-les-mondiaux-vont-se-conclure-a-guichets-fermes-a-paris_6018436_4408996.html Feed it to the parser Look for "Décryptages" and "Qu’est-ce que l’e-sport ?" - this text will not be found in the parsed article.