mravanelli / SincNet

SincNet is a neural architecture for efficiently processing raw audio samples.
MIT License
1.11k stars 260 forks source link

TIMIT (.wrd, .txt, .phn) file interpretations (numbers in front of the line) #42

Closed andriuslauraitis closed 5 years ago

andriuslauraitis commented 5 years ago

Hello,

I want to find out more details about TIMIT database (in particular .TXT, .PHN and .WRD files): For example (in folder train/dr1/FCJF0).

File SI1657.TXT i have the following: 0 45466 Or borrow some money from someone and go home by bus?

Question: What does numbers '0' and '45466' refer to? Perhaps time duration in miliseconds?

File SI1657.WRD : 2120 3533 or 3533 8200 borrow 8200 12291 some 12291 15325 money 15325 18435 from 18435 25984 someone 25984 28960 and 28960 31000 go 31000 34599 home 34599 36200 by 36200 43480 bus

Question: What does numbers (in first two columns) refer to?

File SI1657.PHN (took a fragment) : 0 2120 h# 2120 2725 q

Question: What does numbers (0, 2120 and 2120, 2725) refer to?

Another question: Would SincNet work if no .phn (phonetics) files are provided to the dataset?

Best regards, Andrius L.

mravanelli commented 5 years ago

Those files contains labels in term of sentence (.txt), words (.wrd), and phonemes (.phn). The first number in the column ins the starting sample, the second is the end one. In this repository, SincNet is used for speaker-identification and the only label we use is the speaker identity (e.g., "FCJF0").

Best,

Mirco

On Tue, 7 May 2019 at 09:29, andriuslauraitis notifications@github.com wrote:

Hello,

I want to find out more details about TIMIT database (in particular .TXT, .PHN and .WRD files): For example (in folder train/dr1/FCJF0).

File SI1657.TXT i have the following: 0 45466 Or borrow some money from someone and go home by bus?

Question: What does numbers '0' and '45466' refer to? Perhaps time duration in miliseconds?

File SI1657.WRD : 2120 3533 or 3533 8200 borrow 8200 12291 some 12291 15325 money 15325 18435 from 18435 25984 someone 25984 28960 and 28960 31000 go 31000 34599 home 34599 36200 by 36200 43480 bus

Question: What does numbers (in first two columns) refer to?

File SI1657.PHN (took a fragment) : 0 2120 h# 2120 2725 q

Question: What does numbers (0, 2120 and 2120, 2725) refer to?

Another question: Would SincNet work if no .phn (phonetics) files are provided to the dataset?

Best regards, Andrius L.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mravanelli/SincNet/issues/42, or mute the thread https://github.com/notifications/unsubscribe-auth/AEA2ZVUQAY2I3ANENLTAFV3PUF72HANCNFSM4HLI2ZPQ .

andriuslauraitis commented 5 years ago

Starting sample meaning "in which second (milisecond)" the particular word was started to be spoken? (if taking into account the whole speaking length (duration))? So, if once again taking the file: "File SI1657.WRD", we can say that the word 'borrow' was spoken from
time 3533 to time 8200 (miliseconds)?

Perhaps, when i prepare my own data for SincNet i don't need to specify these 2 numbers at all? But i noticed that in file 'TIMIT_preparation.py' there is a reading from .wrd file: wrd_file=wav_file.replace(".wav",".wrd")

Best, Andrius L.

mravanelli commented 5 years ago

they are not milliseconds, they are samples (e.g., from sample number 3533 to sample number 8200). In our case, the wrd file is only used to remove the silences and the beginning and at the end of each sentence.

On Tue, 7 May 2019 at 10:42, andriuslauraitis notifications@github.com wrote:

Starting sample meaning "in which second (milisecond)" the particular word was started to be spoken? (if taking into account the whole speaking length (duration))? So, if once again taking the file: "File SI1657.WRD", we can say that the word 'borrow' was spoken from time 3533 to time 8200 (miliseconds)?

Perhaps, when i prepare my own data for SincNet i don't need to specify these 2 numbers at all? But i noticed that in file 'TIMIT_preparation.py' there is a reading from .wrd file: wrd_file=wav_file.replace(".wav",".wrd")

Best, Andrius L.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mravanelli/SincNet/issues/42#issuecomment-490109595, or mute the thread https://github.com/notifications/unsubscribe-auth/AEA2ZVTSV2JEDV6RYNLBFKLPUGIMZANCNFSM4HLI2ZPQ .

andriuslauraitis commented 5 years ago

Ok. So, is there a reference somewhere in TIMIT corpus to these sample numbers? Or these sample numbers ar calulated automatically (in some way) for the audio file.

Best, Andrius L.