Open wrz1999 opened 1 year ago
Hi, thank you for the interest, unfortunately the code contained in audio_process_lib is under NDA, however what is implemented in the file is simply a multichannel STFT and a beamformer that extract the beamspace at different directions. You can find all the related info in the paper "A Neural Beamspace-Domain Filter for Real-Time Multi-Channel Speech Enhancement". Symmetry" i.e. the Fixed Beamforming Module (Sec. 2.3.). If you find the time to implement it I'll be glad to merge it to the repo!
Best Luca
Is the beamform read in 'data_lib. py' a fixed beam audio obtained by selecting D directions?
Is it possible to use MVDR instead of this beamforming method?
Exactly the beamspace_matrix in data_lib.py is simply a Microphones X Directions matrix, that applied to the multichannel STFT gives you the beamspace representation, might be worth to check Sec. 3 here(https://ieeexplore.ieee.org/abstract/document/10096891/), it is a different method from the one implemented here it has a similar input pipeline
Yes you can use any beamformer you like, the pipeline is independent from that. The beamformer is simply used to extract the beamspace i.e. STFT at different angular directions.
Thank you. How many epochs do you need to train for a 4-channel data set that has generated more than 80 hours?
And what are the angles of the direction of D?
What is the reason for the training loss to be NAN?
I'll answer to each point
Thank you very much. I will modify my code according to your suggestion
I have solved the problem with NaNs, and the BFM val loss for the first epoch is approximately 10000. Is this normal? How much does the loss need to converge to in the end?
Hello, the 'audio_process_lib' file is not provided in the NeuralBeamspaceDomainFilter project. Can you please provide it