neurolabusc / fsl_sub

parallel FSL processing without requiring SGE
27 stars 12 forks source link

problem to use your code #3

Closed ZHANGneuro closed 6 years ago

ZHANGneuro commented 6 years ago

Dear:

I use FSL-provided centos system on a server with 12 cores, I followed the instruction that copied fsl_sub to replace the original fsl_sub file in bin folder of $FSLDIR, when I run feat, it gives error as shown in snapshot.

Any suggestion? thanks!

snip20180131_8

neurolabusc commented 6 years ago

Is this specific to my version of fsl_sub? In other words, if you press the "Save" button from the FEAT graphical interface and insert the original version of fsl_sub in your path and rerun your FEAT project everything completes normally? One option with running 12 cores locally is that you may exhaust your RAM. For example you could run type FSLPARALLEL=4; export FSLPARALLEL to limit fsl_sub to 4 simultaneous threads, which would conserve your RAM.

ZHANGneuro commented 6 years ago

Thanks for your reply!

If I use original version of fsl_sub everything goes well, but when I use your version of fsl_sub it gives error. I think it might be not related to memory problem, I type FSLPARALLEL=4, and the error came to me immediately (less than 1s) after I run the feat command.

  1. My system environment: The linux system (centos) was installed in a virtual machine VMware workstation as suggested by FSL instruction

  2. For running FEAT project: I run FEAT projects using bash script in for loop as shown below, I thought it allows me to run 4 FEAT analysis simultaneously, but it is very slow.

fsf_list=( an array includes all fsf files ) for i in "${fsf_list[@]}"; do ( feat $i
)& if (( $(wc -w <<<$(jobs -p)) % 4 == 0 )); then wait; fi done

any problem with the above descriptions? Thanks a lot!

Bo

neurolabusc commented 6 years ago

I see. The issue is that you are running each individual in parallel (the "&" runs it in the background). Therefore, you are accelerating FSL by running all the individuals in a group simultaneously, while my script attempts to accelerate FSL by running each individual in a group in serial but running stages of the analysis in parallel. You really should not try each method simultaneously. Your approach is a good one, as it will work even with stages that are not designed to run in parallel. However, I do not know how well fsl is designed to do this locally. I would also think your method will have a severe penalty if your fsf_list has more items than the number of physical CPU cores or if it exceeds your RAM.

Both approaches work by using multiple local CPUs, but you will not want to use both simultaneously.

Your issue is really beyond the scope of my fsl_sub, which is specifically designed to leverage the parallelization built into FSL. However, you could steal a few lines of my code to limit the number of individuals run simultaneously. The crucial loop is while [ $nPID -ge $numCores ]; do which will run up to numCores threads simultaneously until all the jobs are completed. I should point out that this assumes the number of cores is the rate limiting factor, if your bottleneck is RAM, then you will want to adjust this accordingly.

ZHANGneuro commented 6 years ago

Thanks for your help! now the problem is clear. By the way, as you did not mention your fsl_sub file is to accelerate FSL by running each individual, you can add this in the introduction part:)

I will also definitely try your method, in my side, for one full analysis including preprocessing & statistic by using 1 core, FEAT takes me 2 hours, If I use 4 cores for example it may takes me 1.5 hours, it is not linearly accelerating. And compared to SPM, same analysis only takes me 40 mins and SPM support parallelization well based on matlab parfor. I am wondering why FSL takes so long time, could you share your experience? how long does FSL take for full analysis by using 1 core?

Thanks!

bo

ZHANGneuro commented 6 years ago

excuse me, I try to run 1 individual using FEAT command but it still give the error. Could you give more hints how should I use it.

Thanks bo

snip20180201_9

neurolabusc commented 6 years ago

In general, most stages of FSL's FEAT fMRI analysis are not parallelized and therefore my fsl_sub code does not have a major impact. Therefore, processing multiple people simultaneously is probably a more efficient way to accelerate FEAT. On the other hand, my method works well for several of the stages of DTI analyses. I have added comments to the readme to reflect this.

I agree that SPM is very fast. SPM leverages the built in vector operations of Matlab, and has been carefully tuned so that operations that are slow in Matlab are executed by low-level C mex files. It is worth noting that FSL and SPM do a couple of things differently, in particular with respect to statistics (e.g. FLAME) and dealing with auto-correlation. I think comparing the run time of these two very different tools is comparing apples to oranges.

I notice the problem you are having is with melodic which is not part of a generic FEAT pipeline. The notes provide the caveat that parallel melodic must be setup all together in one GUI setup, so I would carefully examine your script and see if you can set up melodic appropriately. If you are unable to run melodic in parallel with your scripts, but are able to run FLAME in parallel, you can adapt my script to exclude melodic from parallel processing. My script already excludes gpu code from running in parallel, so you could easily add a conditional to exclude melodic in the same way that I exclude GPUs:

            if [ "${line#*$key}" != "$line" ] ; then
                numCores=1
                echo "Only running single thread: command includes $key" >&2
        fi
neurolabusc commented 6 years ago

With regards to your comment using 1 core, FEAT takes me 2 hours, If I use 4 cores for example it may takes me 1.5 hours, be aware that with my script some stages run in parallel and others work serially, so you tend to experience Amdahl's law.

Alternatively, running independent individuals in parallel should in theory allow all stages to run in parallel (with different individuals), but again you should not expect completely linear performance increases. For example, most modern CPUs take advantage of the fact that one core uses less power than four, so when only a single thread is running the core is allowed to turbo to a higher speed than if four threads were being processed. In addition, issues like disk contention, cache and memory access, etc will impact performance. You may want to look at my comments here, that page also describes how you can use AWS to process your data on the cloud, which can allow you to buy a large cluster by the hour, which might be a good solution to your problem.

ZHANGneuro commented 6 years ago

Thank you for so much detailed explanations, If I get more time I may go deep about the FSL code, you are right, to buy a cluster is a easier solution as FSL designed for it.