Closed ekaf closed 2 months ago
@stevenbird, this package doesn't disturb anything, and it is needed for testing the new Punkt Tokenizer.
@stevenbird, the index was not rebuilt after merging this PR. As a consequence, the plaintext corpus reader fails to initialize a sent_tokenizer, so nltk can't even start.
This package replaces the pickled Punkt models by PunktParameters stored in tab files.
It seems that nltk.data loads Yaml and Json in a safe way, but the Tab format may be preferable, as it is more concise, clearer to read, and probably even safer.