Add SUBTLEX, support Turkish, expand Greek

These are some changes I was working on at the same time, that kind of interact with each other.

We use SUBTLEX as a data source now, a set of academic wordlists built from subtitles. Sometimes its source data is OpenSubtitles, just like Hermit Dave's, and sometimes it's different. I set up the builder to read SUBTLEX in English (US and UK), Chinese (mostly Simplified), Dutch, and German. This should help improve our Chinese data in particular.
Greek was on the language list but it was quite neglected. I added Wikipedia and Twitter data for Greek.
Similarly, I found that we had quite enough data to get Turkish to three sources. I added Wikipedia, Twitter, and OpenSubtitles for Turkish, and added a special case for tokenization that handles Turkish case-folding.

rspeer / wordfreq