Open henry123-boy opened 1 year ago
Update: I have solved this problem by replace
asm volatile (
"{.reg .f32 r0;"
".reg .pred p;"
"shfl.up.b32 r0|p, %1, %2, 0;"
"@p add.f32 r0, r0, %1;"
"mov.f32 %0, r0;}"
: "=f"(result) : "f"(x), "r"(offset));
by
asm volatile (
"{.reg .f32 r0;"
".reg .pred p;"
"shfl.sync.up.b32 r0|p, %1, %2, 0, -1;"
"@p add.f32 r0, r0, %1;"
"mov.f32 %0, r0;}"
: "=f"(result) : "f"(x), "r"(offset));
where the shfl
instruction is not supported by PTX
higher than 6.4
Hi, now I am confronting the same problem, could you give me some instruction where to modify the inline assembly code
Hi! Thank you for your excellent works and codes, but recently I confronted some compiling problems in my server whose environment is cuda11.3 and arch_sm=86. The issues are reported as below: "ptxas /tmp/tmpxft_0006c5ba_00000000-6_block6x6_pcg_weber.ptx, line 4136; error : Instruction 'shfl' without '.sync' is not supported on .target sm_70 and higher from PTX ISA version 6.4" wish to get reply ~