Closed zhongkaifu closed 4 years ago
Hmm, this sounds very strange. Given that it works or not depending on the card used, I have some doubts that it actually is an issue in the wrapped managedCuda part. I would try to replicate the issue with a simple C++ sample using the CUDA Driver API with the same functions called in managedCuda. (If you compile managedCuda in Debug mode it prints every API call on the console.) If you can replicate the problem, raise a bug at NVIDIA. If it works fine in C++ we'd have to investigate the differences compared to managedCuda...
Thanks @kunzmi .
This problem got resolved after I split a single large kernel PTX file to three smaller kernel PTX. But I still don't know why this happened...
Thanks Zhongkai Fu
Hi @kunzmi ,
Thanks for your great working on ManagedCUDA and my project Seq2SeqSharp is using it.
I currently get a problem while calling LoadKernelPTX function to load kernels. It get stuck forever. This problem only happens on P100-PCIE card, but other cards are okay, such as GTX 1060, GTX1070, P40, K40m and so on.
I found it takes really long time to load a single function from the given PTX file, and single CPU core is full usage.
Do you have any idea about it ? Thanks in advance.