#15 checking parser test

Hi, @mikzolot I'm just checking the parser part you built.

There seems to be some edge cases:

Canadian-American is parsed into 'canadianamerican'.
also 'stextstylefracnsumyibary' seems like a result of deleting hyphens, though I couldn't back track this in text.
etc

I guess it all boils down to this line: article = re.sub(r"[^a-z\s]", "", article)

I couldn't find exaclty what happens with instances like "California.", "opposed;" etc where there is not a space but another character after the word - but it seems that the parser will just skip them.

lisja / KIK-LG211_HALM

#15 checking parser test #15