mthebaud / predict4all

Accurate, fast, lightweight, multilingual, free and open-source next word prediction library
Apache License 2.0
13 stars 1 forks source link

English language support #1

Open mthebaud opened 4 years ago

mthebaud commented 4 years ago

English should be the next language to be implemented in Predict4All. Implementing english support is only a matter of data and small implementations, as its structure is similar to French. The only specific case that can matter in english is the apostroph, that might need some tweaks to be well handled : most of the "may have to" in the following list are guided by this point.

A good start would be to create org.predict4all.nlp.language.english from org.predict4all.nlp.language.french.

You should keep in mind that any language specific code should be created under interfaces : if something previously implemented in French should be different in English, add something related to the LanguageModel. Never use if(language instanceof FrenchLanguageModel) ;-)

These are the steps to implement english prediction

These are the steps to implement english correction rules

mthebaud commented 4 years ago

As suggested by JYA : this description is correct for prediction only ! Adapting a correction model could be more complex. A good resource for model : universaldependencies.org