nuqayah / deen-projects

Discussion of Islamic projects and tools that should be developed (see issues).
11 stars 2 forks source link

Tool for annotating audio segments #4

Open mustafa0x opened 4 years ago

mustafa0x commented 4 years ago

Example use case: annotate the start and end of a Qur'anic MP3 so that a user can listen to specific ayahs a specific number of times (e.g. muqri.com).

There exist similar tools, but they need to be improved UX wise, so that this is as painless as possible. Namely, marking the start and end of segments is superfluous — simply marking the midpoint between two segments is sufficient.

This tool should also be able to import a list of silences generated by ffmpeg, and use them as initial segment marks.

wavesurfer should be used for this. The following plugs are related to this problem:

A related tool is: waveform-playlist (demo), however, wavesurfer is more suitable.

mr-islam commented 4 years ago

Salam, just wanted to update this issue to reflect some efforts that have been made:

mr-islam commented 4 years ago

As salamu alaykum, I'm getting close to done with some urgent changes on Layl and now I'd like to get back on this!

I've been thinking about this project in the back of my mind and had some points I want to clarify about the overall structure of the tool, beyond the tech stack, to make sure I don't go off in some weird direction 😅:

  1. The audio that needs to be annotated: there should be a dropdown listing qurra, then another drop down to select a surah – then the audio is loaded from any static host and the waveform generated.
  2. Silence generation from ffmpeg: These can be generated by us before hand and stored, then loaded when user loads the relevant audio. Depending on the quality of ffmpeg's output (I have no idea), the user will either end up mostly confirming what ffmpeg generated (great!) or having to drag segment handles around the whole time (in which case, maybe starting from scratch would've been easier?)
  3. Format of the output: is a simple json output with segment timings like this good enough: {"0":10.89365,"1":77.498744,"2":99.56801}? I don't think there's any other data to collect. Maybe we can double-check with other websites that also use annotations like this (quran.com + mobile apps, green tech's Quran, quranwbw.com, etc.) for their input on what data or format they'd like. This would help make this one annotator tool a universal place they all can point to whenever users request audio for qari X: "if you want qari x, please annotate it on this website."
  4. Receiving the user's output: Big question here – without getting a whole back end involved what's the easiest way we could receive the annotations produced by users? Hacking some API like slack to be a receiving end point maybe 😂. A lazy way would be to make them download a file then send it to us on telegram / email… but that's terrible UX to make the user do this manually, especially painful on mobile.
  5. Interpreting the output: Thinking bigger picture, when a user annotates surah al-baqarah recited by a qari, do we wait for x more users to annotate the same audio to make the data mutawatir before we act on it? Would we average data from several users maybe, excluding outliers of course?
  6. Updating the site over time: When users have completed annotated certain surahs, we would simply take those audios out of the site, right? So no one ends up duplicating effort and people focus on what needs to be done. Bit of manual maintenance here, but nothing too involved.
  7. Naming: To close on a lighter note, I'm a big fan of the Nuqaya naming scheme with clear one-word Arabic names. I chose تراصف on a whim (and after consulting the dictionary 😂) – what do you think?

جزاكم الله خيرا, and إن شاء الله the effort you take to clarify these points will be worth whatever tiny thing I can build.