neurolabusc / fsl_sub

parallel FSL processing without requiring SGE
26 stars 12 forks source link

flameo issue in FSL v6.0 #7

Closed deschsimon closed 5 years ago

deschsimon commented 5 years ago

Hi!

I have succesfully used your script on my MacbookPro and do like it!

I have now tried to use it on a linux workstation (LinuxMint 19). On this workstation I have 2 versions of FSL installed: /usr/share/fsl/5.0 (FSL version 5.0.11) /usr/share/fsl/6.0 (FSL version 6.0) In both I've replaced fsl_sub by your script. Running the example you provide works fine and shows substantial decrease in duration as expected.

However, if I run a feat-analysis (which uses FLAME, and thus, calls fsl_sub) using FSL 6.0 feat gets stuck when performing flameo. htopshows many flameoprocesses running in parallel. These processes keep running even when I kill feat. All CPUs run at 100%. The only way I can stop this is to kill all flameo processes of the respective user. This does not happen using FSL 5.0.11.

Any ideas on this are highly appreciated! Thank you!

System information:

cat $FSLDIR/etc/fslversion
6.0.0

$ cat /etc/upstream-release/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04 LTS"

$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              32
On-line CPU(s) list: 0-31
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               63
Model name:          Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Stepping:            2
CPU MHz:             1201.049
CPU max MHz:         3200,0000
CPU min MHz:         1200,0000
BogoMIPS:            4799.64
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            20480K
NUMA node0 CPU(s):   0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
NUMA node1 CPU(s):   1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts
neurolabusc commented 5 years ago

Sorry, no expertise. If you want to troubleshoot this yourself, you could try excluding flameo from running in parallel. If you look at the version of fslsub distributed here it excludes any process with the _gpu. You could add a similar exclusion for flameo. It is possible that FSL 6.0 includes an updated fslsub that has other features not in the version distributed here. Alternatively, it is possible that the new version of flemeo includes modifications (e.g. openmp) that would make you want to exclude it from parallel computations.

Regardless, for a high-end system like you are using, you may want to consider investing the time in a conventional approach to using FSL in parallel, e.g. SGE or SLURM. I use the fslsub I distribute here on my laptop to test fsl scripts, but use SLURM for the heavy lifting on the campus supercomputers.

if [ $numCores -gt 1 ] ; then #disable parallel processing for flameo
    line=`sed -n -e ''1'p' $taskfile`
    key="flameo"
    if [ "${line#*$key}" != "$line" ] ; then
        numCores=1
        echo "Only running single thread: command includes $key" >&2
    fi
fi
deschsimon commented 5 years ago

Thanks for the quick reply!

You're right, in the long run we'll definetly go for a Parallel Engine! We've just upgraded hardware and set up the system now. So I thought before I will find the time to setup the Engine I could provide an easy interim solution.

neurolabusc commented 5 years ago

@deschsimon - I agree that SGE/SLURM would be ideal for your new system. However, for other users it would be great if you could trouble shoot this and tell us if this is an incompatibility with my fsl_sub and FSL 6.0 or simply an issue of running two many copies of flameo on your computer (e.g. exhausting RAM). You could test this by including this line in your shell startup script (or your fsl.sh/fsl.csh) FSLPARALLEL=4 - this command will limit my fsl_sub to only use 4 threads, rather than all the ones available on your computer. If it works on your computer, it suggests that my fsl_sub is compatible with FSL 6.0, but you need to make sure you do not run too many jobs concurrently.