protyposis / AudioAlign

Audio Synchronization and Analysis Tool
GNU Affero General Public License v3.0
137 stars 16 forks source link

Consider audio arrival times for video sync #10

Open MarcoRavich opened 7 months ago

MarcoRavich commented 7 months ago

Cornucopia user comment on VH 3ad:

I praise the effort you have put in (especially, assuming you have some substantive pattern matching going on here), but as a (mostly) former industry professional, I have to point out one flaw with these kinds of approaches*: natural desync between different distances of cams/mics/recorders, due to the difference in the speed of sound vs the speed of light. So, close up recordings will have less difference in Arrival Time between Picture & Sound (as well as less ambient sound & echo), whereas long distance shots will have a noticeable difference between the 2 (as well as greater ambient sound & echo). I'm not sure if you have taken this into account. If so, BRAVO! If not, it is possible your "synced" output will actually not be truly synced with picture, or with other perspective tracks. Perhaps you can compensate for this based on the aforementioned ambience/echos.

hoping that helps,

Scott

*(one big reason pros will still use genlock, timecode, slate clapper, click tracks, etc).

protyposis commented 7 months ago

I'm aware (wrote my dissertation on the topic) and that is indeed an unsolved flaw. There are multiple ideas how it could theoretically be detected and fixed (ambient characteristics being one of them), but they are rather specific and likely unreliable as a generic solution. Afaik there's no proper research on possible solutions to date.

While genlock/wordclock is definitely the most reliable and accurate way to achieve synchronization, I'd like to point out that it doesn't fix the speed of sound vs. light discrepancy on a technical level. It minimizes delays between video tracks (due to high speed of light + high speed of sync signal propagation), but audio tracks still desync by distance. It's the combination with the way professional sound setups in (multi-cam) video productions are done, that makes it unnoticeable/irrelevant. What genlock prevents is time drift, which the other suggested "pro" solutions are all affected by.

So thanks a lot for the input, that's an interesting topic and there's still a lot of work to do to make it perfect.