mimbres / neural-audio-fp

https://mimbres.github.io/neural-audio-fp
MIT License
175 stars 25 forks source link

Query generation questions #14

Open stdio2016 opened 3 years ago

stdio2016 commented 3 years ago

I tried to reproduce your paper, and my code is https://github.com/stdio2016/pfann . In my code, I generate queries by:

  1. Randomly slice a x second segment from test music, x is query length
  2. Add one noise file to this segment
  3. Add 2 IRs to this segment, one is for room reverb, and the other is for microphone IR
  4. Save this segment as query file

It seems that your code does these:

  1. Split test music into 1 second segments
  2. For each segment:
  3. Randomly time shift the segment within +/-0.2s
  4. Add one noise segment to the segment. The added noises of each segment seem to maintain time order.
  5. Add one IR file to the segment.
  6. Concatenate these segments and save as query file

My question:

  1. Why do you add different IR to different 1s segments of the same query file? I do not think that the reverb environment would change every 1s.
  2. I use random slicing to simulate query start time, while you randomly shift each 1s segment independently. Isn't uniform time shifting enough?
  3. In your paper, you said "microphone and room impulse response (IR) are sequentially applied by convolution operation." However, I can only find one convolution operation per segment. How do you apply 2 IRs (microphone and room IRs) in one convolution? Do you merge these two datasets, or preprocess so that every new IR is a combination of one microphone and one room IR?
mimbres commented 3 years ago

@stdio2016 Hi, Yi-feng. Thank you for questions and reviewing my code. Your implementation looks awesome! I was really waiting for someone else to implement it with PyTorch. I'm on my way to fork it.

  1. I completely agree with your idea that changing IR every 1s cannot be a realistic test for the sequence (2~10s) search task . Perhaps the test from my implementation can be a slightly more difficult task than actual. But no guarantees. There is no reason other than my oversimplifying mistake in this implementation. Although I focused more on 1s segment-level search in the work, my implementation for evaluating sequence search task needs improvement on the points you made.

  2. Yes, uniform-random start time is enough and the most desired one. In my implementation, I first generate a list of 1s-segments using 0.5s overlapping windows. Then applying +/-0.2s time offset as in the training. The resulting queries will cover only 80% of all possible start times. If I applied +/- 0.25s time offset, it could be much more like uniform random start time. However, due to my laziness, I reused the training data pipeline in the test again. As you mentioned, generating sequence queries with independent sampling of start time is the best way. So I feel your implementation (I didn't review yet) must be more correct.

  3. I merged 2 IR filters for simplicity. I think it was just fine for the test. I believe the data pipeline in this repo is exactly reproducing the experiment. However, Its known drawback is that the pre-processed 300+ IRs would not provide enough randomness for training. I have a plan to improve it in the upcoming data pipeline.

I hope that my answer is a supplement to the points that were not clear in the paper and this repo. Thanks for reminding me of the existing problem of unrealistic test set generation. I have a plan to revise the code reflecting your opinion. I will update answer if I missed any points.