seorim0 / DNN-based-Speech-Enhancement-in-the-frequency-domain

DNN-based SE in the frequency domain using Pytorch. You can test some state-of-the-art networks using T-F masking or spectral mapping method.
MIT License
52 stars 14 forks source link

Data process time #6

Closed WaterBoiledPizza closed 2 years ago

WaterBoiledPizza commented 2 years ago

Hello. I am curious about how long does it take your model to process data? My model (DCCRN, LSTM: real, rnn_layers = 2, rnn_units = 256) takes about 1 second to process data, no matter it is 1 frame or 30 seconds audio, but I think 1 second is a bit long, especially for real-time processing. Is there any way to optimize the model to cut down the process time? Thank you.

seorim0 commented 2 years ago

Hi! When I did frame-by-frame processing in my environment (CPU), it took about 3-4 seconds to process a 4-second clip. (=It takes about 1 frame length to process 1 frame)

What you can do to reduce the processing time is simply to reduce the complexity of the model (adjusting the number of kernels and lstm units, reducing the depth of layers, etc.) This may lead to performance differences and requires experimental optimization. Also, if you're using a DNN-based model in a mobile environment or other small devices, you'll get much faster processing times with Pytorch mobile or TensorFlow lite.

WaterBoiledPizza commented 2 years ago

Thank you for the tips, I will try that out.