ratt-ru / bullseye

A GPU-based facet imager
GNU General Public License v2.0
1 stars 1 forks source link

Coplanar w-faceting #41

Closed bennahugo closed 9 years ago

bennahugo commented 9 years ago

Once the templating is in place this will be added

bennahugo commented 9 years ago

I've been playing around with vectorization intrinsics and I've managed to get a decent speedup of 70% on my Xeon processor provided that the filters are reasonably large full support size of 30+ pixels for real valued filters (single correlation term only for now). The next step is to coalesce the filter (need to pad the filters somewhat, but this should make a difference in cache performance, as well as open up the possibility of further vectorization improvements). When considering that complex multiplications are somewhat more expensive this should relate to even greater improvements.

cyriltasse commented 9 years ago

Excelent, sounds great!

bennahugo commented 9 years ago

@jgain Seperable filters added and vectorized in commit cc5ee5336. All that remains now is refactoring and updating the GPU gridder with the latest changes. I'm going to try and use of the read-only data cache to store the filters ... may give us some more breathing room compared to shared memory.

bennahugo commented 9 years ago

@jgain @cyriltasse Our idea to use 1D separable filters panned out well. Pushing the filter lookups through texture memory (through a surface load call) gives a 20% improvement when running on a 1.5 GiB EVLA dataset with the convolution functions at a full support of 31 pixels and 128 w planes. I decided to keep on supporting compute capability >= 2.0 devices (using surface memory instead of the read only data cache on compute 3.5 cards) since I want to compare the K40m performance to the GT770 I have in my lab machine. I also added an option to compile without the caching if the user ever needs filters that exceeds the device limitations. See commits ab3cbf949 and 57a3351e for details.