Closed prashant45 closed 5 years ago
Hello,
The number of training samples per epoch is
batch_size * steps_per_epoch
( in fit_generator
),[--batch-size] * [--steps]
( in chimeranet-train.py
).Also, the number of validation samples per epoch is
batch_size * validation_steps
( in fit_generator
),[--batch-size] * [--validation-steps]
( in chimeranet-train.py
).See also https://stackoverflow.com/a/43459357 .
Hi,
I understand the parameter of steps_per_epoch
provided in fit_generator
function of keras.
My question was regarding the default parameter for [--steps] = 7200 // 8
. How do you choose/calculate 7200 being the total samples in this case. I know 8 is your batch size.
Based on your function, https://github.com/arity-r/ChimeraNet/blob/389fb54ad9b68c77ab99875c4babc443af68904e/data_generator.py#L57-L66
you randomly select a file for vocal, melody and load only 0.5 second of the audio ( your default parameter for [--duration]
), to create one sample for training/validation after mixing them.
Is your while loop of train_generator or validation_generator
creating a unique sample using generate_one()
from the dataset, during an epoch?
How can I calculate the number of unique samples in my dataset, for instance if I have 10 melody, vocal files each 2 min long and I set [--duration] = 1
.
I am sorry if its dumb question, but I am new to speech data.
I choose 7200 as the number of samples with no reason. If I pick 7200 0.5-second-samples, it would be 1 hour per epoch.
Is your while loop of
train_generator or validation_generator
creating a unique sample usinggenerate_one()
from the dataset, during an epoch?
Yes unless the function choose same vocal and melody file and pick same ranges and mix them in same power level.
How can I calculate the number of unique samples in my dataset, for instance if I have 10 melody, vocal files each 2 min long and I set
[--duration] = 1
.
Almost infinetly many. Some are similar depend on how the function mix vocal and melody.
I use fit_generator
as I can generate infinetly many samples. I only can choose the number of samples, not calculate the number of samples.
Actually, I don't know I'm doing it right. I hope it could be your help.
Hi,
Thanks for the clarification. Yes, the loop is much clear now.
Although in general I believe, during an epoch the network should get unique samples from the dataset. Probably it wouldn't matter if the data is overlapped for couple of milliseconds for 2 samples.
With your loop, it might happen that some data might not be seen by the network even though it is present in the dataset. Or, two samples will have more than 50 % overlap.
Anyways, thanks for the help. :D
Hi,
I have a question about the
data_generator.py
,generate_test_data
function.How do you calculate the number of steps (samples)=7200 in your training script using this function. The generator function for keras requires you to know the number of samples/steps per epoch.
How can I use this to calculate the samples for a different dataset ? Also, how can I calculate the same for a validation set?
Any help for the understanding would be appreciated.