grangier / python-goose

Html Content / Article Extractor, web scrapping lib in Python
Apache License 2.0
3.96k stars 786 forks source link

No Text Extracted for articles from domain http://www.clarin.com #213

Open sathappanspm opened 9 years ago

sathappanspm commented 9 years ago

Hi, I tried extracting the content for articles from http://www.clarin.com, but goose was unable to extract any content from any article under the clarin.com domain (like http://www.clarin.com/politica/Luego-Cristina-Lorenzetti-apertura-judicial_0_1313868802.html). Goose always returns null content eventhough it is able to extract the title.

harikt commented 9 years ago

Same for url : http://www.theguardian.com/football/blog/2015/apr/02/theo-walcotts-toils-loom-large-during-raheem-sterling-contract-stand-off