Many times the music videos have intros with noise, or maybe they're recorded live and the crowd can be heard. This module won't work with these types of issues, and I'm not sure what could be done. I'm creating this issue in case I come up with something, or if someone wants to suggest an idea.
Edit: I was suggested to determine individual regions of maximum correlation, and then resample the region to have them match the timing, rather than shifting the signals. This is because Cooley-Tuckey FFT already chunks the signals in N samples in time, where N is a power of 2.
I should read a book about Digital Signal Processing to understand it better first, so I'll revisit this issue in the future.
Edit: this is going to take a long time because I have to learn new concepts before programming them. Some methods and ideas to consider in order to fix this:
Many times the music videos have intros with noise, or maybe they're recorded live and the crowd can be heard. This module won't work with these types of issues, and I'm not sure what could be done. I'm creating this issue in case I come up with something, or if someone wants to suggest an idea.
Edit: I was suggested to determine individual regions of maximum correlation, and then resample the region to have them match the timing, rather than shifting the signals. This is because Cooley-Tuckey FFT already chunks the signals in N samples in time, where N is a power of 2.
I should read a book about Digital Signal Processing to understand it better first, so I'll revisit this issue in the future.
Edit: this is going to take a long time because I have to learn new concepts before programming them. Some methods and ideas to consider in order to fix this:
Other resources: