pytorch / nestedtensor

[Prototype] Tools for the concurrent manipulation of variably sized Tensors.
BSD 3-Clause "New" or "Revised" License
252 stars 28 forks source link

Faster transpose_copy for conv2d_1x1 #416

Closed cpuhrsch closed 3 years ago

cpuhrsch commented 3 years ago

Follow-up work:

Despite that this already does decrease the runtime by about 10%.

model_name:   resnext101_32x4d, bsz: 64, mean±std shapes[2]: 323.17±145.32, mean±std shapes[3]: 363.86±141.50, loop: 1.50s, nt: 0.43s, speedup: 3.50x
model_name:   resnext101_32x4d, bsz: 128, mean±std shapes[2]: 319.36±143.50, mean±std shapes[3]: 361.93±141.42, loop: 2.97s, nt: 0.88s, speedup: 3.36x
model_name:   resnext101_32x4d, bsz: 256, mean±std shapes[2]: 334.34±147.87, mean±std shapes[3]: 365.12±139.76, loop: 5.95s, nt: 1.88s, speedup: 3.17x