nzilbb / labbcat-server

Server components for LaBB-CAT
GNU Affero General Public License v3.0
2 stars 0 forks source link

Is it possible to integrate Mediapipe-like library to analyse and annoatate videos? #45

Open fishfree opened 3 months ago

fishfree commented 3 months ago

Becuase Labbcat supports video-audio synchronization. If we can also support video annotation, it would be a comprehensive multi-modal corpus management system.

robertfromont commented 3 months ago

LaBB-CAT doesn't currently directly integrate with mediapipe. However, there are two mechanisms with which such video annotation can be stored in LaBB-CAT right now, associated with the corresponding transcript/media:

Associated data files

If the output of the annotator is a data file of some arbitrary format (e.g. a timestamped list of coordinates), the file can still be stored in LaBB-CAT associated with the transcript, by uploading the file as a 'media' file, from the transcripts page using the transcript's 'media' icon. Simply upload the data file for a transcript, and if it's not a recognised media type, a link for it will appear on the transcript page.

There is an example of this on LaBB-CAT's demo instance:

  1. open: https://labbcat.canterbury.ac.nz/demo/transcript?transcript=AP2505_Nelson.eaf
  2. in the top right corner, click the f0 link

You will receive a text data file related to the recording (in this case, a fundamental frequency track produced by the 'Reaper' 3rd party tool). LaBB-CAT doesn't understand or process such files, but stores them associated with their transcript for any other processing (or record-keeping) you might want to do.

Alternative media tracks

If the output of the annotator is a video file with landmarks visually highlighted, it's possible to create an additional 'media track' for annotated video files - e.g. a track called face-detection. Then annotated videos created by external tools can be uploaded to LaBB-CAT from the transcripts page using the transcript's 'media' 🔊 icon, selecting the face-detection track on the upload form.

Once this is done, the face detection version of the video appears as a tickable alternative option on the top right of the transcript page, and ticking it plays back the annotated video instead of the original video.

This can be seen here on LaBB-CAT's demo instance:

  1. open: https://labbcat.canterbury.ac.nz/demo/transcript?transcript=AP2505_Nelson.eaf
  2. in the top right corner, tick the AP2505_Nelson_face option
  3. (click the zoom icon 🔍 to expand the video to make the landmarks more obvious)
fishfree commented 3 months ago

@robertfromont Thank you for your detailed instruction, Robert! Still one question: I cannot find the f0 link in your description: in the top right corner, click the f0 link. So I cannot figure out what you mean for the first method.

robertfromont commented 3 months ago

Sorry, overzealous restrictions for read-only users was preventing the link from appearing for you. It should be visible now.

fishfree commented 3 months ago

@robertfromont Thank you! I got it. I have a suggestion: parsing a video-annotation tier from .eaf files into Labbcat and displaying it synchronized with the correspondent media track. Is it possible to make this out-of-box or with workaround ways?