hidet-org / hidet

An open-source efficient deep learning framework/compiler, written in python.
https://hidet.org
Apache License 2.0
634 stars 50 forks source link

transpose 2d v1 #434

Closed zhiwei-fang closed 4 months ago

zhiwei-fang commented 4 months ago

This is a special version for the current transpose operator. The current transpose operator will handle a general N-dimension transpose, while this PR implement a 2D version to speed up 2D transpose. Thread coarsening and (static) shared memory have been used. Benchmark result:

Running command: python /home/zhiwei/hidet/.github/scripts/bench/bench_op.py transpose2d --params 3000x4000 --dtype float16
type        id  name         runfile        param_id  param_name      dtype_id  dtype_name    hardware_config      latency
--------  ----  -----------  -----------  ----------  ------------  ----------  ------------  -----------------  ---------
operator     3  transpose2d  bench_op.py           7  3000x4000              1  float16                           0.181748