nlesc-sherlock / cluster-analysis

2 stars 4 forks source link

Create a faster GPU kernel for the NCC implementation #16

Open benvanwerkhoven opened 8 years ago

benvanwerkhoven commented 8 years ago

The block-tiled implementation enables data reuse in GPU memory, but we can also reuse data on-chip. The next step is to write a kernel that does exactly this.