Open tbenthompson opened 3 years ago
To test this manually:
import os
import numpy as np
import cutde.gpu as gpu
gpu_config = dict(float_type='float', verbose=True)
gpu.load_gpu(
"aca.cu", tmpl_args=gpu_config, tmpl_dir=os.path.join(os.getcwd(), "cutde")
)
Most of the OpenCL kernels seem to take about 1-3 seconds to compile with
pocl
. But,aca.cu
takes almost three minutes. The is true on both my machine and the github actions CI servers so I suspect it will replicate elsewhere too.Ideas:
-cl-opt-disable
. I would actually be okay leaving this flag on all the time. But, that caused some segfault errors! I'm not sure what the problem is there.aca.cu
compared to the other kernels likefree.cu
orblock.cu
. So, what is causing the compiler to burp here?