prosodylab / Prosodylab-Aligner

Python interface for forced audio alignment using HTK and SoX
http://prosodylab.org/tools/aligner/
MIT License
331 stars 77 forks source link

Word and Silence combined #41

Closed aubert-creation closed 9 years ago

aubert-creation commented 9 years ago

With the french model : erreur

kylebgorman commented 9 years ago

This is almost surely not a bug, simply an indication that the alignment is difficult here. This is probably because the file is over 11 minutes long! Try chopping it up into smaller pieces first: I usually have one input file per breath unit.

aubert-creation commented 9 years ago

Thanks for your reply.

But it's possible to split my wav file automaticaly ? I want to do a force alignement on lot of speech of François Holland or Barack Obama and with word time informations, split speech for extract single word and make a dictionnary. If I must split mannualy before force alignement, it's useless. Actualy, I use EasyAlign, a PraatPlugin and it work with 11min wav file. But I want a more powerful force alignement tool, I want a perfect precision time.

Any ideas?

Sorry for my very bad english. Le 9 avr. 2015 11:59 PM, "Kyle Gorman" notifications@github.com a écrit :

This is almost surely not a bug, simply an indication that the alignment is difficult here. This is probably because the file is over 11 minutes long! Try chopping it up into smaller pieces first: I usually have one input file per breath unit.

— Reply to this email directly or view it on GitHub https://github.com/prosodylab/Prosodylab-Aligner/issues/41#issuecomment-91367093 .

aubert-creation commented 9 years ago

After force alignement, I split speech tout word with ffmpeg, but my force alignement it's not accurate. Le 10 avr. 2015 10:24 AM, "Louis Aubert" louis.aubert8@gmail.com a écrit :

Thanks for your reply.

But it's possible to split my wav file automaticaly ? I want to do a force alignement on lot of speech of François Holland or Barack Obama and with word time informations, split speech for extract single word and make a dictionnary. If I must split mannualy before force alignement, it's useless. Actualy, I use EasyAlign, a PraatPlugin and it work with 11min wav file. But I want a more powerful force alignement tool, I want a perfect precision time.

Any ideas?

Sorry for my very bad english. Le 9 avr. 2015 11:59 PM, "Kyle Gorman" notifications@github.com a écrit :

This is almost surely not a bug, simply an indication that the alignment is difficult here. This is probably because the file is over 11 minutes long! Try chopping it up into smaller pieces first: I usually have one input file per breath unit.

— Reply to this email directly or view it on GitHub https://github.com/prosodylab/Prosodylab-Aligner/issues/41#issuecomment-91367093 .

kylebgorman commented 9 years ago

I am simply suggesting that you provide the aligner the same information you provide EasyAlign, namely, approximate utterance boundaries. The way you communicate this information to this tool is to segment long sound files into shorter ones—11 minute chunks are far too long, one minute might work, utterance sized units will definitely work. If you like the EasyAlign workflow, then do this first step in Praat, and then use a Praat script to split the files into shorter audio files, then run the aligner.

Perfect precision is simply not possible by automated means. If you need very precise timing information, then you should manually automated alignments. Most people find this manual correcting much, much faster than doing it manually.

On Apr 10, 2015, at 4:24 AM, aubert-creation notifications@github.com wrote:

Thanks for your reply.

But it's possible to split my wav file automaticaly ? I want to do a force alignement on lot of speech of François Holland or Barack Obama and with word time informations, split speech for extract single word and make a dictionnary. If I must split mannualy before force alignement, it's useless. Actualy, I use EasyAlign, a PraatPlugin and it work with 11min wav file. But I want a more powerful force alignement tool, I want a perfect precision time.

Any ideas?

Sorry for my very bad english. Le 9 avr. 2015 11:59 PM, "Kyle Gorman" notifications@github.com a écrit :

This is almost surely not a bug, simply an indication that the alignment is difficult here. This is probably because the file is over 11 minutes long! Try chopping it up into smaller pieces first: I usually have one input file per breath unit.

— Reply to this email directly or view it on GitHub https://github.com/prosodylab/Prosodylab-Aligner/issues/41#issuecomment-91367093 .

— Reply to this email directly or view it on GitHub.

kylebgorman commented 9 years ago

edit: “you should manually automated alignments” -> “you should manually correct automated alignments”

K

On Apr 10, 2015, at 4:27 AM, aubert-creation notifications@github.com wrote:

After force alignement, I split speech tout word with ffmpeg, but my force alignement it's not accurate. Le 10 avr. 2015 10:24 AM, "Louis Aubert" louis.aubert8@gmail.com a écrit :

Thanks for your reply.

But it's possible to split my wav file automaticaly ? I want to do a force alignement on lot of speech of François Holland or Barack Obama and with word time informations, split speech for extract single word and make a dictionnary. If I must split mannualy before force alignement, it's useless. Actualy, I use EasyAlign, a PraatPlugin and it work with 11min wav file. But I want a more powerful force alignement tool, I want a perfect precision time.

Any ideas?

Sorry for my very bad english. Le 9 avr. 2015 11:59 PM, "Kyle Gorman" notifications@github.com a écrit :

This is almost surely not a bug, simply an indication that the alignment is difficult here. This is probably because the file is over 11 minutes long! Try chopping it up into smaller pieces first: I usually have one input file per breath unit.

— Reply to this email directly or view it on GitHub https://github.com/prosodylab/Prosodylab-Aligner/issues/41#issuecomment-91367093 .

— Reply to this email directly or view it on GitHub.

aubert-creation commented 9 years ago

ok, thanks.

PS: I have split my wave file in one sentence... poor result, I going to try to train a new french model with very lot of data of Francois Holland speech(> 10hours), may be a more accurate result.

2015-04-11 17:33 GMT+02:00 Kyle Gorman notifications@github.com:

edit: “you should manually automated alignments” -> “you should manually correct automated alignments”

K

On Apr 10, 2015, at 4:27 AM, aubert-creation notifications@github.com wrote:

After force alignement, I split speech tout word with ffmpeg, but my force alignement it's not accurate. Le 10 avr. 2015 10:24 AM, "Louis Aubert" louis.aubert8@gmail.com a écrit :

Thanks for your reply.

But it's possible to split my wav file automaticaly ? I want to do a force alignement on lot of speech of François Holland or Barack Obama and with word time informations, split speech for extract single word and make a dictionnary. If I must split mannualy before force alignement, it's useless. Actualy, I use EasyAlign, a PraatPlugin and it work with 11min wav file. But I want a more powerful force alignement tool, I want a perfect precision time.

Any ideas?

Sorry for my very bad english. Le 9 avr. 2015 11:59 PM, "Kyle Gorman" notifications@github.com a écrit :

This is almost surely not a bug, simply an indication that the alignment is difficult here. This is probably because the file is over 11 minutes long! Try chopping it up into smaller pieces first: I usually have one input file per breath unit.

— Reply to this email directly or view it on GitHub < https://github.com/prosodylab/Prosodylab-Aligner/issues/41#issuecomment-91367093

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/prosodylab/Prosodylab-Aligner/issues/41#issuecomment-91869909 .

aubert-creation commented 9 years ago

Ok, All my videos are taken from youtube or Daylimotion, with subtitle. With the subtitle files in SRT format, I have time information on every sentence spoken. So I'm going through ffmpeg, break my videos and use Prosodylab-Aligner to train a model or use the French model

2015-04-12 3:18 GMT+02:00 Louis Aubert louis.aubert8@gmail.com:

ok, thanks.

PS: I have split my wave file in one sentence... poor result, I going to try to train a new french model with very lot of data of Francois Holland speech(> 10hours), may be a more accurate result.

2015-04-11 17:33 GMT+02:00 Kyle Gorman notifications@github.com:

edit: “you should manually automated alignments” -> “you should manually correct automated alignments”

K

On Apr 10, 2015, at 4:27 AM, aubert-creation notifications@github.com wrote:

After force alignement, I split speech tout word with ffmpeg, but my force alignement it's not accurate. Le 10 avr. 2015 10:24 AM, "Louis Aubert" louis.aubert8@gmail.com a écrit :

Thanks for your reply.

But it's possible to split my wav file automaticaly ? I want to do a force alignement on lot of speech of François Holland or Barack Obama and with word time informations, split speech for extract single word and make a dictionnary. If I must split mannualy before force alignement, it's useless. Actualy, I use EasyAlign, a PraatPlugin and it work with 11min wav file. But I want a more powerful force alignement tool, I want a perfect precision time.

Any ideas?

Sorry for my very bad english. Le 9 avr. 2015 11:59 PM, "Kyle Gorman" notifications@github.com a écrit :

This is almost surely not a bug, simply an indication that the alignment is difficult here. This is probably because the file is over 11 minutes long! Try chopping it up into smaller pieces first: I usually have one input file per breath unit.

— Reply to this email directly or view it on GitHub < https://github.com/prosodylab/Prosodylab-Aligner/issues/41#issuecomment-91367093

.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/prosodylab/Prosodylab-Aligner/issues/41#issuecomment-91869909 .

nbara commented 8 years ago

@aubert-creation Hi there, have you managed to train a good french model? Would you be open to sharing it? I'm in the same situation as you and it would save me A LOT of time if I did not have to train the model myself...