lingjzhu / charsiu

Charsiu: A neural phonetic aligner.
MIT License
275 stars 33 forks source link

TextGrid file isn't according to spec #32

Closed skol101 closed 1 year ago

skol101 commented 1 year ago

Could you check if .textgrid file produced is according to spec?

i'm using this https://github.com/nltk/nltk_contrib/blob/95d1806e2f4e89e960b76a685b1fba2eaa7d5142/nltk_contrib/textgrid.py to test generated textgrid files.

lingjzhu commented 1 year ago

Hi we use praatio to save the textgrid. In most cases it should not give any problems. I have tested the textgrid files using Praat. They can be opened and shows up normally. What are the error messages?

skol101 commented 1 year ago

Yes, you're correct. I initially used Textgrid(file) method which results in error. After using .load(file) method everything is fine. Though I wonder why lab files (created using KALDI) are showing timestamps in ms, and praat in seconds?


         0    1500000 pau
   1500000    4500000 pau
   4500000    6050000 pau
   6050000   13450000 s
  13450000   13600000 ih
  13600000   14400001 k
  14400001   15150000 s
  15150000   17100000 s
  17100000   17250000 p
  17250000   32550001 uw
  32550001   32700000 n
  32700000   32850001 z
  32850001   33000000 ah
  33000000   33150001 v
  33150001   33850000 f
  33850000   34050000 r
  34050000   34349999 eh
  34349999   34500000 sh
  34500000   35300000 s
  35300000   35450001 n
  35450001   35599999 ow
  35599999   37049999 p
  37049999   37200000 iy
  37200000   37449999 z
  37449999   37950001 pau
  37950001   48649998 f
  48649998   48800001 ay
  48800001   48950000 v
  48950000   49099998 th
  49099998   49250002 ih
  49250002   49400001 k
  49400001   49699998 s
  49699998   49850001 l
  49850001   50050001 ae
  50050001   50200000 b
  50200000   51350002 z
  51350002   51500001 ah
  51500001   51650000 v
  51650000   51799998 b
  51799998   51950002 l
  51950002   57900000 uw
  57900000   59200001 ch
  59200001   59349999 iy
  59349999   59499998 z
  59499998   59850001 pau
  59850001   62249999 ae
  62249999   62399998 n
  62399998   62550001 d
  62550001   62700000 m
  62700000   62849998 ey
  62849998   66500001 b
  66500001   66650000 iy
  66650000   66799998 ax
  66799998   67900000 s
  67900000   68049998 n
  68049998   68200002 ae
  68200002   68400002 k
  68400002   73099999 f
  73099999   78550000 ao
  78550000   78699999 r
  78699999   78899999 hh
  78899999   79050002 er
  79050002   79200001 b
  79200001   79349999 r
  79349999   79499998 ah
  79499998   79650002 dh
  79650002   79800000 er
  79800000   79949999 b
  79949999   80100002 aa
  80100002   80249996 b
  80249996   81750002 pau
  81750002   83549995 pau 
lingjzhu commented 1 year ago

I think the time stamp by Kaldi might not be in ms. It looks like the number of samples to me. Maybe you can verify this by dividing it by the sampling rate to see if it's the case. If this is ms, the sounds are way too long.

skol101 commented 1 year ago

Cheers, I'll check your suggestion.