mvcisback / SSLVC

Sound Source Localization using Visual Cues
4 stars 1 forks source link

Handling out of sync Video + Audio #12

Open mvcisback opened 9 years ago

mvcisback commented 9 years ago

We have previously discussed using time warping to justify the in sync assumption.

That said based on @ramili's comment in #1 there seems to be a nice way to link it directly if we ever switch to PLCA (see #9 )

Comments copied here:

P.S. I think, For time warping you assum there is an audio sync with the video, a reference, and then you replace it with a better recording by stretching and compressing the time waveform with respect to that reference, it won't be useful for conference calling. There is actually a nice PLCA way of doing that, I think call Hashing(?) for syncing sensors information and fusion.

Found the paper! http://web.engr.illinois.edu/~paris/pubs/bryan-icassp2012.pdf

ghost commented 9 years ago

DTW cannot not sync audio data to video data, or not that I'm aware of.

On Wed, Oct 29, 2014 at 12:15 PM, Marcell Vazquez-Chanlatte < notifications@github.com> wrote:

We have previously discussed using time warping to justify the in sync assumption.

That said based on @ramili https://github.com/ramili's comment in #1 https://github.com/mvcisback/SSLVC/issues/1 there seems to be a nice way to link it directly if we ever switch to PLCA (see #9 https://github.com/mvcisback/SSLVC/issues/9 )

Comments copied here:

P.S. I think, For time warping you assum there is an audio sync with the video, a reference, and then you replace it with a better recording by stretching and compressing the time waveform with respect to that reference, it won't be useful for conference calling. There is actually a nice PLCA way of doing that, I think call Hashing(?) for syncing sensors information and fusion.

Found the paper! http://web.engr.illinois.edu/~paris/pubs/bryan-icassp2012.pdf

— Reply to this email directly or view it on GitHub https://github.com/mvcisback/SSLVC/issues/12.

Thanks, Best Regards, Ramin

ghost commented 9 years ago

With DTW, you can sync two or more time series of same context, which is really just cross correlation between them, but more optimal and faster. I think what you're saying is to make sure audio is in synced with the video, right? Which given the current tech, we can just assume they already are, if they're not then I'm not sure how DTW can sync the two.

mvcisback commented 9 years ago

I agree its not as directly applicable, but we have a binary wave that is active during movement and off if the movement is below a certain threshold, then I think those two waves can be synced so that the onsets occur at the same time.

That said, I much prefer the idea of using the techniques described in the paper.

ghost commented 9 years ago

I like your thinking outside the box! You know we might be able to use our algorithm to sync videos and audios as well, if it works it could be a paper by itself.

mvcisback commented 9 years ago

hehe, is there any reason this got closed? (I suspect you accidentally hit comment and close)

ghost commented 9 years ago

oops!

mvcisback commented 9 years ago

@ffaghri1 This is the relevant issues