philipperemy / deep-speaker

Deep Speaker: an End-to-End Neural Speaker Embedding System.
MIT License
905 stars 241 forks source link

pre-training optimization for RAM #90

Closed lhang33 closed 3 years ago

lhang33 commented 3 years ago

I have tested the softmax pre-training phase. I see the process contains all input data in the memory, which takes about 30GB in RAM. I wonder if there is a way to optimize it? for example, split the the mega file, "kx_train.npy" into pieces, and then read it by the generator? If this problem is solved, we can do pre-training on much larger dataset (maybe 5000 speakers, etc ) and maybe the performance still can improve.

philipperemy commented 3 years ago

@lhang33 yeah it's using vanilla keras and keras is not really optimized. It requires twice the amount of the data in memory when using the .fit() function. You are totally right here. Using a generator will solve the problem. When I did it here, I had a machine with 32GB of memory and I was also using 32GB of swap. So it was okay for me but I was reaching the limits of my system. There is a lot of ways to improve it, especially when we want to use more speakers (or more data per speaker).