Is there a Documentation?

lowerquality / gentle

gentle forced aligner

https://lowerquality.com/gentle/

MIT License

1.45k stars 295 forks source link

Is there a Documentation? #262

Open florianmunich opened 4 years ago

florianmunich commented 4 years ago

Is there a documentaion of gentle? Anything that goes further than how to install it?

natelawrence commented 4 years ago

Hello, Florian,

I'm only your fellow user, but can tell you this much:

You can refer to the Gentle ReadMe here: https://github.com/lowerquality/gentle/blob/master/README.md

As for myself, I use Docker Desktop on Windows and use Kitematic to give Docker a GUI for me to install versions of Gentle, start them up, close them, uninstall them, etc.

Please write back if you decide to try Docker and are having difficulties getting going.

Let's get you up and running.

natelawrence commented 4 years ago

I apologize for misunderstanding your request. Let me look around to see if any list of options exists.

EDIT: This is the closest thing that I've found to command-line options. https://github.com/lowerquality/gentle/blob/master/align.py

Other than that, the concept is simple.

Supply a piece of media with audibly spoken English.
Supply a transcript of the words spoken.
Allow Gentle to attempt to align the text of the words to the audio of the words.
Enjoy the successful matches and grit your teeth in pain at the failures. ;-)

The two options, presented as checkboxes, on Gentle's input page are as follows:

Include disfluencies: (Gentle will inject "um", "uh", "eh", "ah", etc. into your transcript if they are detected in the audio.
Conservative: (Gentle will only return timing data for words that it is 'confident' about.)

jkurlandski01 commented 3 years ago

With apologies, I want to point out that the summary of the 'conservative' option which natelawrence provides above does not describe the behavior I'm seeing. That's why I think more complete documentation would be useful.

In fact, based on my experience, using the 'conservative' option tends to be more robust than not using it. That is, sometimes a word is marked as 'not-found-in-audio' with 'conservative' turned off, while it is marked as 'success' with 'conservative' turned on. I did a little debugging and it seemed to me that this different behavior is the result of the fact that an oov (out-of-vocabulary) token is acceptable in 'conservative' mode. But I have to admit that I haven't explored the difference in depth.

Also, turning on 'disfluency' injects disfluencies into the aligned output, not into the transcript used as input.