lix19937 / tensorrt-insight

deep insight tensorrt
1 stars 0 forks source link

Cuda failure: operation failed due to a previous error during capture #22

Open lix19937 opened 1 week ago

lix19937 commented 1 week ago

trt use cudagraph report error

lix19937 commented 1 week ago

cuda graph捕获中对 nthreads nblocks 参数是否有要求 ?

如果nthreads nblocks 来自上一个kernel的输出 (D2H)

另外,需要判断 kernel 执行与host 端语句执行顺序 https://github.com/lix19937/cuda-samples-cn/blob/master/Samples/0_Introduction/asyncAPI/asyncAPI.cu

lix19937 commented 1 week ago

WAR : 如果将上一个kernel 的输出 作为下一个kernel的输入 (指针形式传参),并将kernel 的thread block 设置参数使用常量表达式 ,是可行的

而2楼 是将上一个kernel 的输出经过 D2H 拷贝到host端 cpu变量 ,然后基于此变量进行kernel的线程数目设置,则会报错

lix19937 commented 1 week ago

graph 更多内容
https://github.com/lix19937/history/blob/main/cuda/cudagraph.md