pytorch / torchcodec

PyTorch video decoding
BSD 3-Clause "New" or "Revised" License
83 stars 9 forks source link

Refac: Straightforward output shape permutation #317

Closed NicolasHug closed 3 weeks ago

NicolasHug commented 3 weeks ago

This PR is about where and when we call MaybePermuteHWC2CHW(). It's not about tensor allocation (this will come, later).


At a high-level, this PR changes all conditional call patterns like:

if (cond) {
  output.frames = MaybePermuteHWC2CHW(output.frames)
}

to a plain, unconditional

output.frames = MaybePermuteHWC2CHW(output.frames)

This makes it a lot simpler to reason about our output shape permutation. In main, cond is typically input-dependent (but really, caller-dependent), and it leads to a state that's hard to reason about.

Another benefit of this PR is that now all low-level decoding routines (like convertAVFrameToDecodedOutputOnCPU()) have a simpler interface: they only ever take and return HWC tensors.


At a lower level, the following changes were made:


A follow-up of this PR will be to unify the tensor allocation. I think it'd make sense for tensors to always be pre-allocated by the high-level decoding entry points. It will allow us to unify the allocation logic in a single place.

scotts commented 3 weeks ago

Looks good to me. @ahmadsharif1 should also review.