No Text Extracted for articles from domain http://www.clarin.com

grangier / python-goose

Html Content / Article Extractor, web scrapping lib in Python

Apache License 2.0

3.96k stars 786 forks source link

Open sathappanspm opened 9 years ago

sathappanspm commented 9 years ago

Hi, I tried extracting the content for articles from http://www.clarin.com, but goose was unable to extract any content from any article under the clarin.com domain (like http://www.clarin.com/politica/Luego-Cristina-Lorenzetti-apertura-judicial_0_1313868802.html). Goose always returns null content eventhough it is able to extract the title.

harikt commented 9 years ago