Tokenizer optimizations

A few things that speed up the tokenizer, about 2-3x from cases I've checked:

Remove redundant white-space tokenization in BasicTokenizer
Convert basic tokenized tokens to UTF32 in one call in FullTokenizer, and modify WordPieceTokenizer to accept UTF32 as input.
Only call sub.string() once in WordPieceTokenizer.
Remove input validation in WhitespaceTokenizer which may be called many times.

Not found any differences in tokenizations I've tried. The changes in BasicTokenizer should be fine if WhitespaceTokenizer.tokenize commutes with the operations that follow (lower,nfd,splitting on punctuation) since there is already a final call to WhitespaceTokenizer.tokenize. The change in WordPieceTokenizer/FullTokenizer is fine given WhitespaceTokenizer.tokenize(WhitespaceTokenizer.tokenize(x)) = WhitespaceTokenizer.tokenize(x) in every case (which seems reasonable).

matlab-deep-learning / transformer-models

Tokenizer optimizations #11