Open hakkelt opened 3 years ago
Thanks, I will fix this. The problem is that the higher-level function uses the optimization framework incorrectly (as NIHT can not be parallelized easily in this automatic way).
Thank you Martin, I will look into this too from my end and get back to you tomorrow. There is also some GPU support I want to finalise and submit within the next couple of weeks, if it is of interest to Hakkel, I will check the 3D case and get back to you.
I think you can't use optimized_nop for this. You need to reorder dimensions so that all jointly threshold dimensions are innermost and all others (which could be processed in parallel) on the outside by copying in a temporary array (this can be skipped if they are already in this order). You can then reshape the dimensions that the innermost block is collapsed to one dimension. Finally, you use md_parallel_nary to apply the kernel while processing all blocks in parallel but each block must be processed by one kernel application.
Thank you both for your help. :) Sofia: Yes, it would be really nice to have this algorithm be implemented on GPU even for the 3D case!
I'm trying to run the following set of commands:
The last command fails with the following assertion error:
bart: /home/hakta/.bin/bart/src/num/vecops.c:547: klargest_complex_partsort: Assertion 'k <= N' failed.
Similarly,bart pics -R H:$(bart bitmask 0 1 2):0:10 y smap recon
also fails with the same error message.After some debugging, I found the source of the error in
/src/num/optimize.c
(lines 705-735):As
num_auto_parallelize = true
, value ofskip
is decreased from 3 to 2, and thesize
field ofdata
calculated bymd_calc_size(skip, tdims)
is incorrect because the 3D NIHT cannot be parallelized along the third dimension (or, at least, this feature is not implemented)... Settingnum_auto_parallelize = false
at line 574 in/src/num/optimize.c
solved the problem, but I also experienced a slight decrease in performance in the FISTA-based reconstructions.Is it be possible to switch off auto parallelization only for NIHT reconstructions, or is there an easy way to perform NIHT parallel?