Most of the time in GPU programming, we would only materialize "transposes"/"permutes" of data during reads and writes. When doing transposes/permutations of data in GPU registers, it is most of the time free/no-op, since threads will still own the same data but just symbolically different.
Most of the time in GPU programming, we would only materialize "transposes"/"permutes" of data during reads and writes. When doing transposes/permutations of data in GPU registers, it is most of the time free/no-op, since threads will still own the same data but just symbolically different.