Mismatch in audio frames returned by pyav and video reader

pytorch / vision

Datasets, Transforms and Models specific to Computer Vision

https://pytorch.org/vision

BSD 3-Clause "New" or "Revised" License

15.99k stars 6.92k forks source link

Mismatch in audio frames returned by pyav and video reader #3986

Open prabhat00155 opened 3 years ago

prabhat00155 commented 3 years ago

audio-video sync test was failing in https://github.com/pytorch/vision/pull/3934. However, it turns out that the audio frames returned by our video reader doesn't match with pyav results(both output shapes and values are different). Here is the notebook with details.

Tasks:

[ ] Investigate the mismatch
[ ] Fix the issue
[ ] Add audio-video sync test(disabled in https://github.com/pytorch/vision/pull/4050).

cc @bjuncek

mthrok commented 3 years ago

[Off-topic, yet remotely related] Does the video reader handle stereo sound as well? When testing something with audio, I recommend to use stereo audio just to be sure.

prabhat00155 commented 3 years ago

[Off-topic, yet remotely related] Does the video reader handle stereo sound as well? When testing something with audio, I recommend to use stereo audio just to be sure.

read_video and write_video support audio frames with multiple channels, so I would think yes. @bjuncek can confirm this.

bjuncek commented 3 years ago

Yes, we do handle it in normal circumstances. I've learned over the course of debugging this one that there might be more than one way of encoding stereo audio, so I have to double check if we're good for all the edge cases.