There are some errors when I install actnn

Hello! Thanks for your excellent work! I think it is useful for me. But there are some errors when I install actnn.

D:/actnn/actnn/actnn/cpp_extension/minimax_cuda_kernel.cu(19): error: more than one instance of overloaded function "__shfl_sync" matches the argument list:
                function "__shfl_sync(unsigned int, __half, int, int)"
                function "__shfl_sync(unsigned int, c10::Half, unsigned int, int)"
                argument types are: (const unsigned int, __half, const unsigned int, const int)

D:/actnn/actnn/actnn/cpp_extension/minimax_cuda_kernel.cu(52): error: more than one instance of overloaded function "__shfl_sync" matches the argument list:
            function "__shfl_sync(unsigned int, int, int, int)"
            function "__shfl_sync(unsigned int, unsigned int, int, int)"
            function "__shfl_sync(unsigned int, float, int, int)"
            function "__shfl_sync(unsigned int, long long, int, int)"
            function "__shfl_sync(unsigned int, unsigned long long, int, int)"
            function "__shfl_sync(unsigned int, double, int, int)"
            function "__shfl_sync(unsigned int, long, int, int)"
            function "__shfl_sync(unsigned int, unsigned long, int, int)"
            function "__shfl_sync(unsigned int, __half, int, int)"
            function "__shfl_sync(unsigned int, c10::Half, unsigned int, int)"
            argument types are: (unsigned int, c10::Half, int, int)
          detected during instantiation of "void minimax_cuda_kernel(const scalar_t *, scalar_t *, scalar_t *, int64_t, int64_t) [with scalar_t=c10::Half]"
(82): here

D:/actnn/actnn/actnn/cpp_extension/minimax_cuda_kernel.cu(65): error: more than one instance of overloaded function "__shfl_sync" matches the argument list:
            function "__shfl_sync(unsigned int, int, int, int)"
            function "__shfl_sync(unsigned int, unsigned int, int, int)"
            function "__shfl_sync(unsigned int, float, int, int)"
            function "__shfl_sync(unsigned int, long long, int, int)"
            function "__shfl_sync(unsigned int, unsigned long long, int, int)"
            function "__shfl_sync(unsigned int, double, int, int)"
            function "__shfl_sync(unsigned int, long, int, int)"
            function "__shfl_sync(unsigned int, unsigned long, int, int)"
            function "__shfl_sync(unsigned int, __half, int, int)"
            function "__shfl_sync(unsigned int, c10::Half, unsigned int, int)"
            argument types are: (unsigned int, c10::Half, int, int)
          detected during instantiation of "void minimax_cuda_kernel(const scalar_t *, scalar_t *, scalar_t *, int64_t, int64_t) [with scalar_t=c10::Half]"
(82): here

3 errors detected in the compilation of "C:/Users/xJun/AppData/Local/Temp/tmpxft_00004d44_00000000-7_minimax_cuda_kernel.cpp1.ii".

D:/actnn/actnn/actnn/cpp_extension/quantization_cuda_kernel.cu(126): error: no instance of overloaded function "std::min" matches the argument list
            argument types are: (long long, long)

D:/actnn/actnn/actnn/cpp_extension/quantization_cuda_kernel.cu(190): error: no instance of overloaded function "std::min" matches the argument list
            argument types are: (long long, long)

D:/actnn/actnn/actnn/cpp_extension/quantization_cuda_kernel.cu(302): error: no instance of overloaded function "std::min" matches the argument list
            argument types are: (long long, long)

D:/actnn/actnn/actnn/cpp_extension/quantization_cuda_kernel.cu(379): error: no instance of overloaded function "std::min" matches the argument list
            argument types are: (long long, long)

4 errors detected in the compilation of "C:/Users/xJun/AppData/Local/Temp/tmpxft_000024d8_00000000-7_quantization_cuda_kernel.cpp1.ii".

D:/actnn/actnn/actnn/cpp_extension/quantization_cuda_kernel.cu(64): error: calling a __host__ function("fmax<double, float, (int)0> ") from a __global__ function("pack_mixed_precision_kernel<double> ") is not allowed

D:/actnn/actnn/actnn/cpp_extension/quantization_cuda_kernel.cu(64): error: identifier "fmax<double, float, (int)0> " is undefined in device code

D:/actnn/actnn/actnn/cpp_extension/quantization_cuda_kernel.cu(64): error: calling a __host__ function("fmax<double, float, (int)0> ") from a __global__ function("pack_mixed_precision_kernel<float> ") is not allowed

D:/actnn/actnn/actnn/cpp_extension/quantization_cuda_kernel.cu(64): error: identifier "fmax<double, float, (int)0> " is undefined in device code

D:/actnn/actnn/actnn/cpp_extension/quantization_cuda_kernel.cu(64): error: calling a __host__ function("fmax<double, float, (int)0> ") from a __global__ function("pack_mixed_precision_kernel< ::c10::Half> ") is not allowed

D:/actnn/actnn/actnn/cpp_extension/quantization_cuda_kernel.cu(64): error: identifier "fmax<double, float, (int)0> " is undefined in device code

D:/actnn/actnn/actnn/cpp_extension/quantization_cuda_kernel.cu(252): error: calling a __host__ function("fmax<double, float, (int)0> ") from a __global__ function("pack_single_precision_kernel<double, (bool)0> ") is not allowed

D:/actnn/actnn/actnn/cpp_extension/quantization_cuda_kernel.cu(252): error: identifier "fmax<double, float, (int)0> " is undefined in device code

D:/actnn/actnn/actnn/cpp_extension/quantization_cuda_kernel.cu(252): error: calling a __host__ function("fmax<double, float, (int)0> ") from a __global__ function("pack_single_precision_kernel<float, (bool)0> ") is not allowed

D:/actnn/actnn/actnn/cpp_extension/quantization_cuda_kernel.cu(252): error: identifier "fmax<double, float, (int)0> " is undefined in device code

D:/actnn/actnn/actnn/cpp_extension/quantization_cuda_kernel.cu(252): error: calling a __host__ function("fmax<double, float, (int)0> ") from a __global__ function("pack_single_precision_kernel< ::c10::Half, (bool)0> ") is not allowed

D:/actnn/actnn/actnn/cpp_extension/quantization_cuda_kernel.cu(252): error: identifier "fmax<double, float, (int)0> " is undefined in device code

D:/actnn/actnn/actnn/cpp_extension/quantization_cuda_kernel.cu(252): error: calling a __host__ function("fmax<double, float, (int)0> ") from a __global__ function("pack_single_precision_kernel<double, (bool)1> ") is not allowed

D:/actnn/actnn/actnn/cpp_extension/quantization_cuda_kernel.cu(252): error: identifier "fmax<double, float, (int)0> " is undefined in device code

D:/actnn/actnn/actnn/cpp_extension/quantization_cuda_kernel.cu(252): error: calling a __host__ function("fmax<double, float, (int)0> ") from a __global__ function("pack_single_precision_kernel<float, (bool)1> ") is not allowed

D:/actnn/actnn/actnn/cpp_extension/quantization_cuda_kernel.cu(252): error: identifier "fmax<double, float, (int)0> " is undefined in device code

D:/actnn/actnn/actnn/cpp_extension/quantization_cuda_kernel.cu(252): error: calling a __host__ function("fmax<double, float, (int)0> ") from a __global__ function("pack_single_precision_kernel< ::c10::Half, (bool)1> ") is not allowed

D:/actnn/actnn/actnn/cpp_extension/quantization_cuda_kernel.cu(252): error: identifier "fmax<double, float, (int)0> " is undefined in device code

18 errors detected in the compilation of "C:/Users/xJun/AppData/Local/Temp/tmpxft_00004cdc_00000000-7_quantization_cuda_kernel.cpp1.ii".

ucbrise / actnn

There are some errors when I install actnn #9