Is it possible to support audio subtitle files in textgrid format? - Githubissues

yuchen-lea / org-media-note

Taking interactive notes when watching videos or listening to audios in org-mode.

GNU General Public License v3.0

238 stars 35 forks source link

Is it possible to support audio subtitle files in textgrid format? #61

Open Odysseus6000 opened 1 month ago

Odysseus6000 commented 1 month ago

Is it possible to support audio subtitle files in textgrid format?

yuchen-lea commented 1 month ago

From a technical perspective, it's not difficult to achieve.

The more challenging part lies in the product side. From what I have briefly researched, this seems to be a use case mainly focused on linguistics, and I'm not entirely sure what the specific needs are in that context.

For instance, do you need to export a timestamp or an AB-loop timestamp, and should the hierarchy structure be preserved?

If you could provide an actual TextGrid demo along with the corresponding org-mode conversion result, it would be extremely helpful!

Odysseus6000 commented 1 month ago

There are two formats of textgrid files which can be created and opened with the praat software. The information contained in them are the subtitles of the audio file and the CMUBET phoneme (which can be converted to the corresponding international phonetic alphabet). These correspond to “words” and “phones” respectively. The attached file is a textgrid file in one of the formats, with an AB-loop timestamp, which can be used as a basis. I want to export an AB-loop timestamp to org-mode. For example, if the attached textgrid file is imported into org-mode, I would like to get the following content: Words Column: AS THE SONG PICKS UP EMOTION sp SO SHOULD sp THE MACHINE GET GRANDER sp sil IN ITS PROCESS sp THEY WANTED US Phones Column: AE1 Z DH AH0 S AO1 NG P IH1 K S AH1 P IH0 M OW1 SH AH0 N S OW1 SH UH1 D DH IY0 M AH0 SH IY1 N G IH1 T G R AE1 N D ER0 sil IH0 N IH1 T S P R AA1 S EH2 S DH EY1 W AO1 N IH0 D AH1 There is a python package praatio that can be used to process textgrid files. The attached test.textgird file can be opened with emacs. test.TextGrid.zip Thank you very much.

yuchen-lea commented 1 month ago

I have released the initial version of the feature on the textgrid branch. Feel free to pull and try it out.

Usage:

Import from the interface; the path for the textgrid file needs to be selected manually, while the media path will be automatically fetched from the current heading or MPV playback path.

All tiers will be converted.

For interval tiers: they will be converted into ab-loop links.
For point tiers: they will be converted into timestamp links.

Prerequisites:

Python and praatio should be installed.

Basic Concept:

When calling org-media-note-insert-note-from-textgrid, it will first call a Python script (the path is stored in org-media-note-textgrid-python-script, and defaults to be textgrid.py under org-media-note). This Python script converts the TextGrid file into a specific format, which is then processed by search and replace.

I have tested it on macOS with the file you provided. If there are any issues, feel free to report them.

Odysseus6000 commented 1 month ago

Very cool! When I use org-media-note under the textgrid branch, after executing the command org-media-note-show-interface I get expand-file-name: Symbol's function definition is void. straight--repos-dir? Probably because I'm not using straight to manage packages. Is it possible to add the following function: after inputting the time period such as 0:00:1-0:00:4, automatically insert the contents of the words and phones in the textgrid file for the corresponding time period into the org-mode. The content displayed in the org-mode is as follows: Words: UP EMOTION sp SO SHOULD sp THE MACHINE GET GRANDER Phones: P IH0 M OW1 SH AH0 N S OW1 SH UH1 D DH IY0 M AH0 SH IY1 N G IH1 T G R AE1 N D ER0

yuchen-lea commented 1 month ago

expand-file-name: Symbol's function definition is void. straight--repos-dir? Probably because I'm not using straight to manage packages.

You are right. I forgot to check if the straight--repos-dir function exists. This function is supposed to automatically set the org-media-note-textgrid-python-script to the absolute path of "textgrid.py". You can choose to set this variable manually, or pull the latest code to see if the issue has been fixed.

Is it possible to add the following function: after inputting the time period such as 0:00:1-0:00:4, automatically insert the contents of the words and phones in the textgrid file for the corresponding time period into the org-mode. The content displayed in the org-mode is as follows: Words: UP EMOTION sp SO SHOULD sp THE MACHINE GET GRANDER Phones: P IH0 M OW1 SH AH0 N S OW1 SH UH1 D DH IY0 M AH0 SH IY1 N G IH1 T G R AE1 N D ER0

If I got you correctly, are you looking to insert content only from a specific time range, rather than all content? Can your requirement be met with the current import and merge functionality? The delimiter is set to a space.

Odysseus6000 commented 4 weeks ago

Yes! I want to insert content only from a specific time range. what is the key of H?

yuchen-lea commented 4 weeks ago

It's Hyper key, check the ReadMe. or you could call org-media-note-merge-item directly.

BTW, importing whole content works well?

Odysseus6000 commented 3 weeks ago

importing whole content works well!