Open sam-writer opened 4 years ago
this component should have the biggest KenLM model we can fit in and still have PyPi allow it... but we could also have instructions that you can curl -O 'https://master.dl.sourceforge.net/project/openccg/data/gigaword4.5g.kenlm.bin'
(or even wrap that in a
from replacy_kenlm_scorer import KenLMScorer
klm = KenLMScorer.download_gigaword()
KenLMScorer is fantastic. Just so useful. However, it isn't core to replaCy and should be a custom pipeline component (that we expect most people to use... think like
en_core_web_sm
is for spaCy - a separate installation, but in all the docs) that is separately installable.I think what using our current pipeline should look like, after extraction, is: