Closed elchaima1234 closed 11 months ago
This is also a problem that we have encountered. The problem could be in the STFT itself. It allows the network to insert the watermark in a short interval of the audio, thus creating a croaking artifact. PESQ remains at high values overall, but a part of the signal is seriously affected. You could try to mitigate it by reducing the length of the audio interval used for watermarking, or by removing some of the attacks. We are dealing with this problem in our next model.
I want to try your model with my own dataset, but I am facing some difficulties regarding data preprocessing. What are the steps to follow before feeding the data into the model? Currently, my PESQ is 4.2, but when I reconstruct the audio file and listen to it, the quality is too bad. During training, the PESQ is consistently mentioned as 4.2 but the reality is not. NB: I use librosa to load and resize and converting the data into stft, how i can fix this problem