shiyuzh2007 / ASR

Apache License 2.0
55 stars 27 forks source link

what's the format of the src_path and dst_path #4

Closed sigpro closed 5 years ago

sigpro commented 5 years ago

Thanks for your wonderful work. And I want to know the format of the two files,are they the same like kaldi? I use the kaldi feat.scp and when process the ark file,it's broken. Thanks.

shiyuzh2007 commented 5 years ago

1)The src_path is the format of kaldi's scp, e.g. 20040527_210939_A901153_B901154-A-000281-000641 /mnt/lustre/xushuang2/syzhou/tensor2tensor/hkust_ci_phone/src_data/train_dim80/feats.ark.left3_sub3:48 20040527_210939_A901153_B901154-A-000974-001262 /mnt/lustre/xushuang2/syzhou/tensor2tensor/hkust_ci_phone/src_data/train_dim80/feats.ark.left3_sub3:153711 2)The dst_path is the transcript, e.g. 20040527_210939_A901153_B901154-A-000281-000641 喂 你 好 能 听 到 20040527_210939_A901153_B901154-A-000974-001262 呃 可 以 呀 3)Please make sure the features are stacked with 3 frames to the left and downsampled to a 30ms frame rate, which is described in the paper.

sigpro commented 5 years ago

3x