If I understand correctly, the implementation of the individual tile computation can be rewritten in OpenCL to make use of all forms of computation cores (CPU, GPU, APU, etc) on the system. Once this is done, the OpenCL pipeline will automatically create optimized code for all applicable platforms.
This needs further research to see if this all correct.
With the implementation of issue #3, the program is plenty fast right now. OpenCL does not seem like it is worth the trouble at the moment. Closing for now.
If I understand correctly, the implementation of the individual tile computation can be rewritten in OpenCL to make use of all forms of computation cores (CPU, GPU, APU, etc) on the system. Once this is done, the OpenCL pipeline will automatically create optimized code for all applicable platforms.
This needs further research to see if this all correct.