tarekeldeeb / DeepSpeech-Quran

A TensorFlow implementation of Baidu's DeepSpeech architecture
Mozilla Public License 2.0
85 stars 17 forks source link

Eval Threshold, amount_Threshold and the "Imam_Tarteel" data mixing mechanism clarification #19

Closed smalissa closed 3 years ago

smalissa commented 3 years ago

@tarekeldeeb @aibrahim- @omerasif-itu

Assalam Alaikum please anyone helps me in answering these questions. I understand the work that did in Tarteel data and Imams Data, I understand the work workflow and i could run the code, but I have missunderstanding in these points:

  1. In the second phase, I understand the goal from this phase and why you are enough only with 70% from the total data, but I can't able to understand what is the Eval Threshold and for what used clearly, also what is mean that Eval Threshold = 0.15 
  Also, I need to understand what these numbers mean explicitly :
           amount_thr={
                  '100p': 9999999,
                  '70p':   799000,
                  '50p':   560000,
                  '30p':   260000,
                  '5sec':  160000
              }

I know this step to build a CSV file with 70% of the tarteel_users, but I need to understand what these numbers mean exblicitly.

  1. In Imam + filtered users dataset that’s used: I need to know how the mixing of Imama data and Tarteel Data did? Is it through by padding the Tarteel_user Train file to the Imam Train file and the same thing for the dev file and test file, or there's a specific mechanism?

Thank you in advance.

tarekeldeeb commented 3 years ago

Long Aya recordings causes the training to be veery slow. You Should Trim off the long files. At the mentioned code snippet, you select the cutoff threshold by filesize in bytes. To be more human readable, some selected percentages are converted to bytes.

I Hope It's Clear To You.

smalissa commented 3 years ago

Thank you , but It is not very clear, could you please explain in a simpler way. I know the idea and the goal of the trimming step, but I can't understand what these values mean clearly in more details? for example in '100p': 9999999, the value 100 indicates to what, p symbol also and the value 100 too? Also, what is "eval_threshold = 0.15" means, indicates eo what?

Also , what about the second question above about the data mixing?

smalissa commented 3 years ago

@tarekeldeeb @aibrahim- @omerasif-itu

Assalam Alaikum please if you give me a clarification for these questions :

  1. In Imam + filtered users dataset that’s used: I need to know how the mixing of Imama data and Tarteel Data did? Is it through by padding the Tarteel_user Train file to the Imam Train file and the same thing for the dev file and test file, or there's a specific mechanism?

  2. I know the idea and the goal of trimming off the long files, but I need to understand what these values mean clearly:

for example in '100p': 9999999, the value 100 indicates to what, p symbol also and the value 100 too? Also, what is "eval_threshold = 0.15" means, indicates to what?

Thank you in advance.

tarekeldeeb commented 3 years ago

100p is a short hand for 100% of the data. You are also provided by 70%, .. etc. The numerical value is the file size in bytes. Remember, the data is lossless, the longer the recording the larger the file size.