Closed xadupre closed 2 weeks ago
I chose cvt for convert, I thought it was a common abbrevation. I replaced by cast. For float32, float16, we can extend to bfloat16 if needed. I chose not to reduce compilation time but we definitly can add it. What do you mean by how to run the unit tests? There is one file associated to all the kernel implemented in that folder: test/cuda/test_cudaops.py
. I just extended it with one of two models per kernel.
Please ping me when you want me to review again. Thanks.