lowerquality / gentle

gentle forced aligner
https://lowerquality.com/gentle/
MIT License
1.43k stars 295 forks source link

Using Gentle for transcription only not working #125

Closed pietrop closed 7 years ago

pietrop commented 7 years ago

Hello, First thing first this is a great project! I am super excited to see a decent open source STT that actually produces some good results. I'd love to get more involved into adding more documentation for example how to train a new model for a different language, and get a better sense of how the code base is structured.

TL;DR

I would like to be able to use Gentle to transcribe audio from the terminal and what I've tried so far does not seem to work.

For example by running

python align.py examples/data/lucier.mp3 

Without providing text for alignment, and get a transcription. This currently does not work in the code of the master branch because the txtfile is a required argument. Passing an empty text file returns {}.

Background

I managed to integrate Gentle with the app I am working on by following @strob indications and using the local server that is started when launching version 0.9.1.

What I tried

These are some the issues I came across when trying to figure this out.

On branch master

curl  -F 'audio=@examples/data/lucier.mp3'  'http://localhost:8765/transcriptions?async=false'

This doesn't work and neither does

python align.py examples/data/lucier.mp3 

On 0.9.1

Using the code from the 0.9.1 tag in the github repo I get the same results as the examples above in with the master branch code.

On os x 0.9.1 app local server

However when launching the local server of the 0.9.1 desktop app, this works. it returns a json of the transcription.

 curl  -F 'audio=@examples/data/lucier.mp3'  'http://localhost:8765/transcriptions?async=false'

Conclusion

I was looking through the code base to see how to make python align.py examples/data/lucier.mp3 recognise that the txtfile param is not present and return a transcription instead of give out an error that the parameter is missing.

What I described above makes me think that, the code in the repo for 0.9.1 version is not consistent with what is supposed to be corresponding in the 0.9.1 os x packaged version? Or am I getting something wrong?

I do understand that Gentle is primary an aligner and that the transcription is a bonus, but since it has the capability to be making the transcription, I think it be far more flexible if when the txtfile param is absent it would just transcribe instead. This would open so many possibilities as an open source STT component that can work offline.

Let me know if I was not clear on anything, and what you suggest would be the best way to figure this out?

strob commented 7 years ago

align.py aligns. you'll want to use the transcriber.py:

python gentle/transcriber.py examples/data/lucier.mp3  lucier.json

However, using python directly is subject to change, and I don't want to support a Python API until there's a real strategy for python setup.py install language model paths, etc.

The HTTP API will be more stable. As of f3f351fed310e56592f5621eb3a716d93575b89e, the CURL command you've given works:

curl  -F 'audio=@examples/data/lucier.mp3'  'http://localhost:8765/transcriptions?async=false'

It also works on the demo server:

curl  -F 'audio=@examples/data/lucier.mp3'  'http://gentle-demo.lowerquality.com/transcriptions?async=false'

Please make sure you're on the latest git version of Gentle when filing issues.

pietrop commented 7 years ago

Ok, that's great! I'll try it out thanks!

I am very interested in being able to use it through Python, rather then starting a local server. Could you elaborate on what you mean by this ?

using python directly is subject to change, and I don't want to support a Python API until there's a real strategy for python setup.py install language model paths, etc.

@strob what would need to be in place to support a good strategy to keep this going forward?