gramener / gramex-nlg

Natural Language Generation for Gramex applications.
Other
24 stars 24 forks source link

spaCy infix separators #19

Open jaidevd opened 4 years ago

jaidevd commented 4 years ago

Datasets may have hyphenated column names or names separated by underscores, like the iris dataset.

Initialize the spaCy tokenizer with modified infix rules to handle this.