microsoft / nnfusion

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
MIT License
952 stars 158 forks source link

General TopK Kernel for HLSL devices #429

Closed wenxcs closed 2 years ago

wenxcs commented 2 years ago

This branch still has some problems with cross thread-group sync. The test current enabled very rarely crashes.

wenxcs commented 2 years ago

Slow version with only 1 block to reduce topk sequence per request.