martino / goose

Html Content / Article Extractor in Java open sourced from http://gravity.com
1 stars 1 forks source link

Blogspot #1

Open martino opened 13 years ago

martino commented 13 years ago

https://gist.github.com/786204

martino commented 13 years ago

The problem is in documentCleaner.clean that remove the div with the blog! I think in the convertDivsToParagraphs function