prosodylab / Prosodylab-Aligner

Python interface for forced audio alignment using HTK and SoX
http://prosodylab.org/tools/aligner/
MIT License
331 stars 77 forks source link

ValueError Interval #66

Closed benjisympa closed 4 years ago

benjisympa commented 6 years ago

Good Morning, thank you for your work.

I've launched the soft but I have an error. Have you ever seen this ?

(py3) ✘ maurice@client-172-18-65-151  ~/Prosodylab-Aligner   master  python -m aligner -a data Traceback (most recent call last): File "/Users/maurice/anaconda3/envs/py3/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/Users/maurice/anaconda3/envs/py3/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/Users/maurice/Prosodylab-Aligner/aligner/main.py", line 134, in size = MLF(aligned).write(args.align) File "/Users/maurice/anaconda3/envs/py3/lib/python3.6/site-packages/textgrid/textgrid.py", line 788, in init self.read(f, samplerate) File "/Users/maurice/anaconda3/envs/py3/lib/python3.6/site-packages/textgrid/textgrid.py", line 830, in read phon.add(pmin, pmax, line[2]) File "/Users/maurice/anaconda3/envs/py3/lib/python3.6/site-packages/textgrid/textgrid.py", line 433, in add self.addInterval(interval) File "/Users/maurice/anaconda3/envs/py3/lib/python3.6/site-packages/textgrid/textgrid.py", line 441, in addInterval i = bisect_left(self.intervals, interval) File "/Users/maurice/anaconda3/envs/py3/lib/python3.6/site-packages/textgrid/textgrid.py", line 208, in lt raise (ValueError(self, other)) ValueError: (Interval(0.00000, 155.7200012, sil), Interval(155.72000, 155.89000, S)) (py3) ✘ maurice@client-172-18-65-151  ~/Prosodylab-Aligner   master 

Thank you very much.

kylebgorman commented 6 years ago

Hi,

I have seen this sort of issue before (it's in the textgrid library here: https://github.com/kylebgorman/textgrid) but I thought we'd dealt with it before. It happens during creation of a textgrid when two intervals appear to overlap in the textgrid. In this case they don't really overlap, it's just that the representation of floating point numbers is very approximate.

For this reason in textgrid.py (near the top) we define separate precisions for TextGrid inputs and for MLF (i.e., from Prosodylab-Aligner) inputs. I suspect if you set

DEFAULT_MLF_PRECISION = 5

to a lower value (say 3), the issue will go away? (You will have to do this wherever textgrid.py is installed for Python 3.) Please try it out and report back, and if this helps, I can commit the fix.

On Tue, Nov 28, 2017 at 1:19 PM, benjisympa notifications@github.com wrote:

Good Morning, thank you for your work.

I've launched the soft but I have an error. Have you ever seen this ?

(py3) ✘ maurice@client-172-18-65-151  ~/Prosodylab-Aligner   master  python -m aligner -a data Traceback (most recent call last): File "/Users/maurice/anaconda3/envs/py3/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/Users/maurice/anaconda3/envs/py3/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/Users/maurice/Prosodylab-Aligner/aligner/main.py", line 134, in size = MLF(aligned).write(args.align) File "/Users/maurice/anaconda3/envs/py3/lib/python3.6/site- packages/textgrid/textgrid.py", line 788, in init self.read(f, samplerate) File "/Users/maurice/anaconda3/envs/py3/lib/python3.6/site- packages/textgrid/textgrid.py", line 830, in read phon.add(pmin, pmax, line[2]) File "/Users/maurice/anaconda3/envs/py3/lib/python3.6/site- packages/textgrid/textgrid.py", line 433, in add self.addInterval(interval) File "/Users/maurice/anaconda3/envs/py3/lib/python3.6/site- packages/textgrid/textgrid.py", line 441, in addInterval i = bisect_left(self.intervals, interval) File "/Users/maurice/anaconda3/envs/py3/lib/python3.6/site- packages/textgrid/textgrid.py", line 208, in lt raise (ValueError(self, other)) ValueError: (Interval(0.00000, 155.7200012, sil), Interval(155.72000, 155.89000, S)) (py3) ✘ maurice@client-172-18-65-151  ~/Prosodylab-Aligner   master 

Thank you very much.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/prosodylab/Prosodylab-Aligner/issues/66, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJuObj2JX7ds5Ob_bHJKoI9l1JZ3Sveks5s7E6bgaJpZM4QtsDD .

benjisympa commented 6 years ago

Hi, thank you for your answer. I modify the file in : /Users/maurice/anaconda3/envs/py3/lib/python3.6/site-packages/textGrid but in ValueError: (Interval(0E-10, 155.7200012000, sil), Interval(155.7200000000, 155.8900000000, S)) the second number is still bigger than the fourth (155.7200012000 > 155.7200000000), I try to change the precision but there is no impact. I try 3, 5 and 10. Here it's with 10. I don't know what is the sil or S but maybe I need to round up the sil value ? I don't need so much precision I think for the 12 in 155.7200012000.

kylebgorman commented 6 years ago

Yeah, we need some way to round down the sil's second value. It really ought to be exactly 155.72 with all trailing zeros, and I'm not sure why it isn't. (Do you know what value it corresponds to in the MLF file? Those are in, iirc, in units of 100 nanoseconds, and we attempt to convert from that to seconds). Perhaps there's some issue there.

On Mon, Dec 4, 2017 at 10:36 AM, benjisympa notifications@github.com wrote:

Hi, thank you for your answer. I modify the file in : /Users/maurice/anaconda3/envs/ py3/lib/python3.6/site-packages/textGrid but in ValueError: (Interval(0E-10, 155.7200012000, sil), Interval(155.7200000000, 155.8900000000, S)) the second number is still bigger than the fourth (155.7200012000 > 155.7200000000), I try to change the precision but there is no impact. I try 3, 5 and 10. Here it's with 10. I don't know what is the sil or S but maybe I need to round up the sil value ? I don't need so much precision I think for the 12 in 155.7200012000.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/prosodylab/Prosodylab-Aligner/issues/66#issuecomment-348998734, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJuOesCF--rT-PZHhguuVemk8xz96p6ks5s9BF_gaJpZM4QtsDD .

benjisympa commented 6 years ago

Yes I think it's the problem, but I don't know MLF file, I just have the audio and the transcription of the first episode of Big Bang Theory. Maybe we can add a round directly into the code ?

kylebgorman commented 6 years ago

Just FIY: the MLF file is the temporary file generated by the shell call to HVite during the final alignment. It's stored in a temporary directory. If you want you can hack the aligner to log the location of the temporary file, and to not delete it at the end, then you can inspect it if you wish.

On Tue, Dec 5, 2017 at 8:48 AM, benjisympa notifications@github.com wrote:

Yes I think it's the problem, but I don't know MLF file, I just have the audio and the transcription of the first episode of Big Bang Theory. Maybe we can add a round directly into the code ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/prosodylab/Prosodylab-Aligner/issues/66#issuecomment-349308834, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJuOar49GZ5ZIuLTUDRTzUd60pO29Thks5s9UmhgaJpZM4QtsDD .

benjisympa commented 6 years ago

Ok thanks, I've modified the script. I have 2 empty folders and one file, is it what you thought ? :

(py3) ✘ maurice@client-172-18-65-151  ~/Prosodylab-Aligner   master ●  more tmp/tmpkr9z6j3y/HERest.cfg CEPLIFTER = 22 ENORMALIZE = T NUMCEPS = 12 NUMCHANS = 20 PREEMCOEF = 0.97 TARGETKIND = MFCC_D_A_0 TARGETRATE = 100000.0 USEHAMMING = T WINDOWSIZE = 250000.0 (py3) maurice@client-172-18-65-151  ~/Prosodylab-Aligner   master ●  ls tmp/tmpkr9z6j3y/ 000 001 HERest.cfg (py3) maurice@client-172-18-65-151  ~/Prosodylab-Aligner   master ● 

kylebgorman commented 6 years ago

the file in question is hidden, it's called .aligned.mlf. (so use ls -a to find it)

Actually on looking more carefully, it looks like it'll be in your TextGrids output directory. Sorry for the mis-direction there!

On Tue, Dec 5, 2017 at 11:22 AM, benjisympa notifications@github.com wrote:

Ok, I modify the script. I have 2 empty folders and one file, is it what you thought ? :

(py3) ✘ maurice@client-172-18-65-151  ~/Prosodylab-Aligner   master ●  more tmp/tmpkr9z6j3y/HERest.cfg CEPLIFTER = 22 ENORMALIZE = T NUMCEPS = 12 NUMCHANS = 20 PREEMCOEF = 0.97 TARGETKIND = MFCC_D_A_0 TARGETRATE = 100000.0 USEHAMMING = T WINDOWSIZE = 250000.0 (py3) maurice@client-172-18-65-151  ~/Prosodylab-Aligner   master ●  ls tmp/tmpkr9z6j3y/ 000 001 HERest.cfg (py3) maurice@client-172-18-65-151  ~/Prosodylab-Aligner   master ● 

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/prosodylab/Prosodylab-Aligner/issues/66#issuecomment-349356947, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJuOZIWwcOLWp8BKMdUlmSKj7klk5odks5s9W3RgaJpZM4QtsDD .

benjisympa commented 6 years ago

Sry for my delay to answer. I don't know where is the TextGrids output directory but I have nothing in the GitHub project folder, where I clone the projet and where I put the data.

kylebgorman commented 6 years ago

It is the same as the input directory for the .lab and .wav files.

On Wed, Dec 20, 2017 at 12:19 AM, benjisympa notifications@github.com wrote:

Sry for my delay to answer. I don't know where is the TextGrids output directory but I have nothing in the GitHub project folder, where I clone the projet and where I put the data.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/prosodylab/Prosodylab-Aligner/issues/66#issuecomment-352789269, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJuOTWg36mdgYZlJ1yyMFKblHKV0eaZks5tB9QBgaJpZM4QtsDD .

benjisympa commented 6 years ago

Great, I found him !

more data/.aligned.mlf

!MLF!

"/var/folders/hg/gc5hq9ln10v4htbsj2_k15j40000gp/T/tmpxze4evhf/audio/TheBigBangTheory.Season01.Episode01.en.upper.lab" 0 1557200012 sil sil 1557200000 1558900000 S SO 1558900000 10409999854 OW1 10409999854 10410699854 sp 10410700000 13195299915 sil sil .

kylebgorman commented 6 years ago

Sorry to let this linger but we a contributor just added a new way of rounding that I hope may help with your issue? Please try locally when you get a chance. -K

On Fri, Dec 22, 2017 at 8:58 AM, benjisympa notifications@github.com wrote:

Great, I found him !

more data/.aligned.mlf

!MLF!

"/var/folders/hg/gc5hq9ln10v4htbsj2_k15j40000gp/T/tmpxze4evhf/ audio/TheBigBangTheory.Season01.Episode01.en.upper.lab" 0 1557200012 sil sil 1557200000 1558900000 S SO 1558900000 10409999854 OW1 10409999854 10410699854 sp 10410700000 13195299915 <(319)%20529-9915> sil sil .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/prosodylab/Prosodylab-Aligner/issues/66#issuecomment-353603307, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJuOblOW7XoC2AVyeQgAPdRnFZUZY8qks5tC7WSgaJpZM4QtsDD .

benjisympa commented 6 years ago

No problem, I try with gentle/kaldi and it seems to work so it's good for my job. I can pull and try if you want. Thanks.