Open stdio2016 opened 3 years ago
@stdio2016 Hi, Yi-feng. Thank you for questions and reviewing my code. Your implementation looks awesome! I was really waiting for someone else to implement it with PyTorch. I'm on my way to fork it.
I completely agree with your idea that changing IR every 1s cannot be a realistic test for the sequence (2~10s) search task . Perhaps the test from my implementation can be a slightly more difficult task than actual. But no guarantees. There is no reason other than my oversimplifying mistake in this implementation. Although I focused more on 1s segment-level search in the work, my implementation for evaluating sequence search task needs improvement on the points you made.
Yes, uniform-random start time is enough and the most desired one. In my implementation, I first generate a list of 1s-segments using 0.5s overlapping windows. Then applying +/-0.2s time offset as in the training. The resulting queries will cover only 80% of all possible start times. If I applied +/- 0.25s time offset, it could be much more like uniform random start time. However, due to my laziness, I reused the training data pipeline in the test again. As you mentioned, generating sequence queries with independent sampling of start time is the best way. So I feel your implementation (I didn't review yet) must be more correct.
I merged 2 IR filters for simplicity. I think it was just fine for the test. I believe the data pipeline in this repo is exactly reproducing the experiment. However, Its known drawback is that the pre-processed 300+ IRs would not provide enough randomness for training. I have a plan to improve it in the upcoming data pipeline.
I hope that my answer is a supplement to the points that were not clear in the paper and this repo. Thanks for reminding me of the existing problem of unrealistic test set generation. I have a plan to revise the code reflecting your opinion. I will update answer if I missed any points.
I tried to reproduce your paper, and my code is https://github.com/stdio2016/pfann . In my code, I generate queries by:
It seems that your code does these:
My question: