Hello, the 'audio_process_lib' file is not provided in the NeuralBeamspaceDomainFilter project. Can you please provide it - Githubissues

lucacoma / NeuralBeamspaceDomainFilter

Unofficial Implementation of "Liu, W., Li, A., Wang, X., Yuan, M., Chen, Y., Zheng, C., & Li, X. (2022). A Neural Beamspace-Domain Filter for Real-Time Multi-Channel Speech Enhancement. Symmetry, 14(6), 1081."

15 stars 4 forks source link

Hello, the 'audio_process_lib' file is not provided in the NeuralBeamspaceDomainFilter project. Can you please provide it #1

Open wrz1999 opened 1 year ago

wrz1999 commented 1 year ago

Hello, the 'audio_process_lib' file is not provided in the NeuralBeamspaceDomainFilter project. Can you please provide it

lucacoma commented 1 year ago

Hi, thank you for the interest, unfortunately the code contained in audio_process_lib is under NDA, however what is implemented in the file is simply a multichannel STFT and a beamformer that extract the beamspace at different directions. You can find all the related info in the paper "A Neural Beamspace-Domain Filter for Real-Time Multi-Channel Speech Enhancement". Symmetry" i.e. the Fixed Beamforming Module (Sec. 2.3.). If you find the time to implement it I'll be glad to merge it to the repo!

Best Luca

wrz1999 commented 1 year ago

Is the beamform read in 'data_lib. py' a fixed beam audio obtained by selecting D directions？

wrz1999 commented 1 year ago

Is it possible to use MVDR instead of this beamforming method?

lucacoma commented 1 year ago

Exactly the beamspace_matrix in data_lib.py is simply a Microphones X Directions matrix, that applied to the multichannel STFT gives you the beamspace representation, might be worth to check Sec. 3 here(https://ieeexplore.ieee.org/abstract/document/10096891/), it is a different method from the one implemented here it has a similar input pipeline

lucacoma commented 1 year ago

Yes you can use any beamformer you like, the pipeline is independent from that. The beamformer is simply used to extract the beamspace i.e. STFT at different angular directions.

wrz1999 commented 1 year ago

Thank you. How many epochs do you need to train for a 4-channel data set that has generated more than 80 hours?

wrz1999 commented 1 year ago

And what are the angles of the direction of D?

wrz1999 commented 1 year ago

What is the reason for the training loss to be NAN?

lucacoma commented 1 year ago

I'll answer to each point

the directions considered are {0, 45,90,135, 180} degrees, but you can also vary them (further details about the chosen config. different from the proposed paper can be found here https://ieeexplore.ieee.org/abstract/document/10096891/)
regarding training time is hard to tell, my suggestion is to plot the results and check, probably (I guess) at least a day.
the NaNs are probably due to the automatic mixed precision package (https://pytorch.org/docs/stable/amp.html) used in the training loop, which can sometimes result in NaNs depending on the dataset, by removing it it should work without any problems

wrz1999 commented 1 year ago

Thank you very much. I will modify my code according to your suggestion

wrz1999 commented 1 year ago

I have solved the problem with NaNs, and the BFM val loss for the first epoch is approximately 10000. Is this normal? How much does the loss need to converge to in the end?