timothygrant80 / cisTEM

Other
32 stars 27 forks source link

Add fp16 functionality #453

Closed bHimes closed 1 year ago

bHimes commented 1 year ago

Description

This PR adds two primary functionalities to cisTEM:

1) Enable use of fp16 2) Enable isolation of GPU enabled code, even in core library functions 3) Expand testing in samples functional testing and adds unit testing functionality via catch2

The PR is unfortunately large, touching many files, but I decided it was safer to pull in many changes at once (which have been tested for ~ a year in my downstream) than to try to split all of this up, which would almost certainly produce errors and would certainly take much longer.

fp16 particle stacks

cisTEM can now read and write MRC mode 12 which is half-precision floating point format.

fp16 Image methods

cistem::Image can now allocate a fp16 buffer and has a method to calculate/apply a CTF in half-precision

cistem::GPUImage has a pile of fp16 methods added (mostly templated.)

Enable Isolation of gpu code

Previously, when building with ENABLEGPU, this define was used cisTEM wide, which meant that any GPU related code had to reside only in an individual program. Now, two new precompiler defines are created.

Fixes # (issue)

I have rebased my feature branch to be current with the master branch using to minimize conflicts and headaches

Which compilers were tested

These changes are isolated to the

How has the functionality been tested?

Please describe the tests that you ran to verify your changes. Please also note any relevant details for your test configuration.

Checklist:

bHimes commented 1 year ago

@jojoelfe I"m getting a "no space left on disk" which is killing the icpc container/CI. Any ideas?

jojoelfe commented 1 year ago

I think github recently started to be more strict about the disk space runners get. Its about 14GB and the icpc image is already 6GB I think and due to the static compiling the binaries are large, too.

I'll look into whether we can get more space somehow, but maybe we just have to trim the image.

bHimes commented 1 year ago

You mentioned that projection during template matching is now done on the GPU. Can you maybe point to the change that enables this?

Ah, it isn't in match template yet, the functionality to do GPU projection however is now in place.

@jojoelfe a reference implementation can be seen in the functional test:

src/programs/samples/1_cpu_gpu_comparison/projection_comparison.cpp

This required new image methods and meta data to swap momemntum (Fourier) space quadrants. This method

Image::SwapFourierSpaceQuadrants

also does an additional shift by one pixel so the x=-1 components are included. This is needed to efficiently use texture memory for gpu interpolation.

Normally we think of a real space shift by fourier multiplication, but you can do the same thing by a complex multiplication in real space to shift the Fourier spectrum. This adds the complication that the input image is now complex, for which cisTEM has no out-of-the-box FFT routines, hence the tmp_real and tmp_imag images, which use the linearity of the FFT to make a complex FFT from two real FFT's.

The corresponding method in the GpuImage class is still ExtractSlice and shares most of the same syntacx.

This also required a debug assert on Image::BackwardFFT as there is no good way to "undo" this shift and workaround complex FFT.

Ultimately, we need complex -> complex FFT routines, but it will be easier to use them through my FastFFT library (working on finishing up with TIm r/n) than to modify the Image class directly.