tarsqi / ttk

Tarsqi Toolkit
Apache License 2.0
25 stars 10 forks source link

Replace the TreeTagger with spaCy #108

Open marcverhagen opened 3 years ago

marcverhagen commented 3 years ago

There is nothing wrong with the TreeTagger, but it does increase complexity of installation and maintenance and probably comes with a small speed performance hit.

Use spaCy instead. This will bring up some questions on tokenization and pluggability. For full pluggability we could still keep the option to insert the TreeTagger and/or the tokenizer and therefore add spaCy as an extra alternative (probably the default). I am leaning against having to do any work to keep non-python components around, but it may be good to have a built-in way for people to add custom components (this should be a separate issue though).

marcverhagen commented 3 years ago

For now we just have a branch where spaCy is integrated. It needs more testing and it needs work:

See docs/notes/ttk-spacy.md for more.