[Feature] Sentence Segmentation

Problem

We would like the system to work on arbitrarily long texts, however the longer the text, the worse the SignWriting translation output.

Description

Instead of sending the entire text to the translation model at once, the text should be segmented into individual sentences. This will create a problem for contextualized translations that we will have to resolve in the future, but will allow a quick advancement with existing modeling techniques. Furthermore, it requires smaller context windows, and so the encoder-decoder models used can work faster.

Sentence splitting should be transparent to the user and indicated on the source text.

[x] Split sentences on device using Intl.Segmenter
[x] Display sentence boundaries on hover
[ ] Use segmented sentences for machine translation

Alternatives

Possibly we can train more robust machine translation models, by appending sentences to each other and creating long context fake data.

sign / translate