sign / translate

Effortless Real-Time Sign Language Translation
https://sign.mt
Other
491 stars 85 forks source link

Sign Segmentation: Include Sign Segmentation Model #72

Open AmitMY opened 1 year ago

AmitMY commented 1 year ago

Problem

Given a pose sequence, we would like to perform two types of segmentation.

Sentence segmentation - every sentence should be then translated independently. Sign segmentation - every sign in a sentence should be transcribed to SignWriting independently.

Description

We currently have such a segmentation model https://github.com/sign-language-processing/transcription/tree/main/pose_to_segments Which works reasonably well for sentences, but not at all well for signs.

Should perhaps look into developing an autoregressive model like https://arxiv.org/pdf/2301.02214.pdf

That way, we could also perform this live.

Alternatives

Use the existing model, which is bi-directional, and will require re-running on the sequence every single time.

Additional context

No response

AmitMY commented 1 year ago

We are done with our segmentation model: https://arxiv.org/abs/2310.13960

We should integrate it by doing:

  1. Removing free camera support - user should be able to stop and restart the camera

  2. We perform pose estimation, and segmentation. Segments are stored as an array-of-arrays - sentences, and within them signs.

  3. Segments are shown in multiple ways:

    • SignWriting (future work)
    • images (first frame + last frame + arrow of movement if exists) ideally like this image image
  4. When hovering a segment, the video plays in a loop only of that segment