I'm using gentle to align Chinese audios and their corresponding texts. However, I found the start and end timestamps returned from gentle don't seem to be valid. For example, as the image is shown below, the text means "relieve students' burdens of homework and after-class remedial teaching during the period of compulsory education". It should take about 3~5 seconds, but gentle told me that it'll only take 0.53 seconds which is definitely wrong.
Could anybody tell me what's the problem? Are there any other tricks I didn't know to preprocess results returned from gentle? Any advice will be appreciated! Thanks!
I'm using gentle to align Chinese audios and their corresponding texts. However, I found the start and end timestamps returned from gentle don't seem to be valid. For example, as the image is shown below, the text means "relieve students' burdens of homework and after-class remedial teaching during the period of compulsory education". It should take about 3~5 seconds, but gentle told me that it'll only take 0.53 seconds which is definitely wrong.
Could anybody tell me what's the problem? Are there any other tricks I didn't know to preprocess results returned from gentle? Any advice will be appreciated! Thanks!