Closed jowagner closed 1 year ago
add option to train a case-insensitive model, e.g. lowercasing all data in get_item_atoms() or as part of the tokeniser.
get_item_atoms()
New default with commit 068828170413104ad0f0e9f8eba18652af36f219 is to have both lowercase and truecase ngrams that can each be switched off with command line options, see train.py --help.
train.py --help
add option to train a case-insensitive model, e.g. lowercasing all data in
get_item_atoms()
or as part of the tokeniser.