bHimes commented 1 year ago

Description

This PR adds two primary functionalities to cisTEM:

1) Enable use of fp16 2) Enable isolation of GPU enabled code, even in core library functions 3) Expand testing in samples functional testing and adds unit testing functionality via catch2

The PR is unfortunately large, touching many files, but I decided it was safer to pull in many changes at once (which have been tested for ~ a year in my downstream) than to try to split all of this up, which would almost certainly produce errors and would certainly take much longer.

fp16 particle stacks

cisTEM can now read and write MRC mode 12 which is half-precision floating point format.

Reading is enabled by default, and is based on the image header, nothing is needed. Values are read into a fp32 array as always, however, this will be changed in the future.
Writing is currently only enabled for particle stacks and is under the configuraeble flag --enable-fp16-particlestacks

fp16 Image methods

cistem::Image can now allocate a fp16 buffer and has a method to calculate/apply a CTF in half-precision

cistem::GPUImage has a pile of fp16 methods added (mostly templated.)

Enable Isolation of gpu code

Previously, when building with ENABLEGPU, this define was used cisTEM wide, which meant that any GPU related code had to reside only in an individual program. Now, two new precompiler defines are created.

WANT_CISTEM_GPU_AM- set -DENABLEGPU program by program inside Makefile.am and link in libgpucore.a
SHOW_CISTEM_GPU_OPTIONS - this is used to set GPU tick boxes etc in the gui, but does not reveal any gpu code itself.
This also means we can now compile both a CPU version and a GPU version at the same time simplifying the design proces. E.g. match_template and match_template_gpu are both created.
This also means we can extend core methods, where something like core/StopWatch.cpp has gpu related extensions to synchronize the GPU in core/gpu/core_extensions/StopWatch.cu. Although not yet pulled in, this is also how I extended EulerSearch.cpp that runs the brute force search, to work on CPU or GPU by templating the function on the input image type, and specializing for GpuImage inside src/gpu/core_extensions/euler_search.cu

Fixes # (issue)

I have rebased my feature branch to be current with the master branch using to minimize conflicts and headaches

[X] yes
[ ] no

Which compilers were tested

[ ] g++
[X] icpc
[ ] clang
[ ] other (please specify)

These changes are isolated to the

[X] gui
[X] core library
[X] gpu core library
[X] program it modifies

How has the functionality been tested?

Please describe the tests that you ran to verify your changes. Please also note any relevant details for your test configuration.

[X] Tested manually from GUI
[X] Tested manually from CLI
[X] Passed console tests
[X] Passed samples functional testing
[X] other (also passes the new unit_tests)

Checklist:

[ ] I have not changed anything that did not need to be changed
[X] I have performed a self-review of my own code
[X] I have commented my code, (w.r.t. why), particularly in hard-to-understand areas
[ ] I have made corresponding changes to the documentation {Ok to pass for now}
[X] My changes generate no new warnings (My changes remove several warnings)
[X] Any dependent changes have been merged and published in downstream modules

bHimes commented 1 year ago

@jojoelfe I"m getting a "no space left on disk" which is killing the icpc container/CI. Any ideas?

jojoelfe commented 1 year ago

I think github recently started to be more strict about the disk space runners get. Its about 14GB and the icpc image is already 6GB I think and due to the static compiling the binaries are large, too.

I'll look into whether we can get more space somehow, but maybe we just have to trim the image.

bHimes commented 1 year ago

You mentioned that projection during template matching is now done on the GPU. Can you maybe point to the change that enables this?

Ah, it isn't in match template yet, the functionality to do GPU projection however is now in place.

@jojoelfe a reference implementation can be seen in the functional test:

src/programs/samples/1_cpu_gpu_comparison/projection_comparison.cpp

This required new image methods and meta data to swap momemntum (Fourier) space quadrants. This method

Image::SwapFourierSpaceQuadrants

also does an additional shift by one pixel so the x=-1 components are included. This is needed to efficiently use texture memory for gpu interpolation.

Normally we think of a real space shift by fourier multiplication, but you can do the same thing by a complex multiplication in real space to shift the Fourier spectrum. This adds the complication that the input image is now complex, for which cisTEM has no out-of-the-box FFT routines, hence the tmp_real and tmp_imag images, which use the linearity of the FFT to make a complex FFT from two real FFT's.

The corresponding method in the GpuImage class is still ExtractSlice and shares most of the same syntacx.

This also required a debug assert on Image::BackwardFFT as there is no good way to "undo" this shift and workaround complex FFT.

Ultimately, we need complex -> complex FFT routines, but it will be easier to use them through my FastFFT library (working on finishing up with TIm r/n) than to modify the Image class directly.

timothygrant80 / cisTEM

Add fp16 functionality #453

Description

fp16 particle stacks

fp16 Image methods

Enable Isolation of gpu code

I have rebased my feature branch to be current with the master branch using to minimize conflicts and headaches

Which compilers were tested

These changes are isolated to the

How has the functionality been tested?

Checklist: