google / corpuscrawler

Crawler for linguistic corpora
Other
190 stars 56 forks source link

Portuguese: doubt about the corpus result #49

Open ghost opened 5 years ago

ghost commented 5 years ago

I was analyzing the exit file and I realized the text for each "news" is only the title, the headline, and the 1st paragraph. It must be correct? I'm using the crawler for "pt" language.

brawer commented 5 years ago

Please don’t hesitate to make changes for improving the current state! Your pull requests would certainly be welcome.