tbenthompson / cutde

Python CPU and GPU accelerated TDEs, over 100 million TDEs per second!
MIT License
58 stars 15 forks source link

The pocl compilation time for the ACA kernel is bizarrely long. #10

Open tbenthompson opened 3 years ago

tbenthompson commented 3 years ago

Most of the OpenCL kernels seem to take about 1-3 seconds to compile with pocl. But, aca.cu takes almost three minutes. The is true on both my machine and the github actions CI servers so I suspect it will replicate elsewhere too.

Ideas:

tbenthompson commented 3 years ago

To test this manually:

import os
import numpy as np
import cutde.gpu as gpu

gpu_config = dict(float_type='float', verbose=True)
gpu.load_gpu(
    "aca.cu", tmpl_args=gpu_config, tmpl_dir=os.path.join(os.getcwd(), "cutde")
)