sign / translate

Effortless Real-Time Sign Language Translation
https://sign.mt
Other
414 stars 74 forks source link

[Feature] Sentence Segmentation #148

Open AmitMY opened 4 months ago

AmitMY commented 4 months ago

Problem

We would like the system to work on arbitrarily long texts, however the longer the text, the worse the SignWriting translation output.

Description

Instead of sending the entire text to the translation model at once, the text should be segmented into individual sentences. This will create a problem for contextualized translations that we will have to resolve in the future, but will allow a quick advancement with existing modeling techniques. Furthermore, it requires smaller context windows, and so the encoder-decoder models used can work faster.

Sentence splitting should be transparent to the user and indicated on the source text. image

Alternatives

Possibly we can train more robust machine translation models, by appending sentences to each other and creating long context fake data.