Add tests for tokenizers

The test details are

10 word tokenizer tests. (8 pass, 2 fail). The incorrect tokenizations are: a. "Dr." ==> "Dr", "." . Expected ==> "Dr." b. "3:00" ==> "3", ":", "00". Expected ==> "3:00"
3 custom regular expression tokenizer tests. Compared to NLTK tests, the tests for regular expression with named group and back references are skipped.
Simple sentence splitter test.
Open Questions
Do we have implementations for tokenizers with regex containing named groups/back references? If no, any plans to implement?
Also, NLTK actually does not support back references. So if we support, should we actually support or just notify lack of support like NLTK does :-( ?

Related to this, http://weitz.de/cl-ppcre/#*allow-named-registers*, cl-ppcre has support for named groups/back references. (After all, it's an Edi Weitz library!)