optimaize / language-detector

Language Detection Library for Java
Apache License 2.0
568 stars 165 forks source link

No way to change default n-gram size from 3 to something else #102

Open lngdet opened 5 years ago

lngdet commented 5 years ago

n-gram size seems fixed at 3, how can it be changed to a user-specified value?

james-s-w-clark commented 4 years ago

Class NgramExtractors has

    private static final NgramExtractor STANDARD = NgramExtractor
            .gramLengths(1, 2, 3)
            .filter(StandardNgramFilter.getInstance())
            .textPadding(' ');

so you should be able to make your own like this. I think bundled models only have 1-grams, 2-grams, and 3-grams.