scraperwiki / magic-summary-tool

ScraperWiki tool to summarise stuff about any table of data
8 stars 3 forks source link

German (and other language) stopwords in word cloud #32

Closed frabcus closed 11 years ago

frabcus commented 11 years ago

A German wordcloud shows "die", "der", "und" and so on as most common words.

Can we maybe use NLTK or something to get stopwords in lots of languages?

See for example: https://beta.scraperwiki.com/dataset/dmcrboi/view/acgwqgi (by @ictocw)

frabcus commented 11 years ago

Was nice in the end - nltk.corpus.stopwords.words() returns the words for all languages NLTK has. Passed that as JSON from a short Python script to the Javascript.

I've left the extra English stopwords in too for now.