pnlbwh / luigi-pnlpipe

NIFTI pipeline glued together using Luigi
Other
7 stars 7 forks source link

How to run HCP pipeline #47

Closed tashrifbillah closed 3 years ago

tashrifbillah commented 3 years ago

Manual approach

At PNL, data must be unringed before HCP pipeline. Unringing and organizing according to BIDS is conveniently done by Luigi pipeline. On the other hand, HCP pipeline has a lot of parameters. Given the latter, defining a node for HCP pipeline in Luigi pipeline is cumbersome. A compromise is to unring via Luigi pipeline and then feed all results to HCP pipeline:

unringing of PA - dir-99 > workflows/ExecuteTask.py --task GibbsUn --bids-data-dir $HOME/HCP/rawdata -c 2001 -s 1 \ --dwi-template sub-*/ses-*/dwi/*acq-PA_dir-99_dwi.nii.gz - dir-107 > workflows/ExecuteTask.py --task GibbsUn --bids-data-dir $HOME/HCP/rawdata -c 2001 -s 1 \ --dwi-template sub-*/ses-*/dwi/*acq-PA_dir-107_dwi.nii.gz

unringing of AP - dir-99 > luigi-pnlpipe/workflows/ExecuteTask.py --task GibbsUn --bids-data-dir $HOME/HCP/rawdata -c 2001 -s 1 \ --dwi-template sub-*/ses-*/dwi/*acq-AP_dir-99_dwi.nii.gz - dir-107 > luigi-pnlpipe/workflows/ExecuteTask.py --task GibbsUn --bids-data-dir $HOME/HCP/rawdata -c 2001 -s 1 \ --dwi-template sub-*/ses-*/dwi/*acq-AP_dir-107_dwi.nii.gz

for i in sub-*/ses-*/dwi/*acq-PA_dir-99_dwi.nii.gz sub-*/ses-*/dwi/*acq-PA_dir-107_dwi.nii.gz \
         sub-*/ses-*/dwi/*acq-AP_dir-99_dwi.nii.gz sub-*/ses-*/dwi/*acq-AP_dir-107_dwi.nii.gz
do
  luigi-pnlpipe/workflows/ExecuteTask.py --task GibbsUn --bids-data-dir $HOME/HCP/rawdata -c 2001 -s 1 \
  --dwi-template $i
done

HCP pipeline

HCPpipelines/DiffusionPreprocessing/DiffPreprocPipeline.sh \
--path=$HOME/HCP/derivatives/pnlpipe/sub-2001/ses-1/dwi/ --subject=hcppipe \
--cuda-version=9.1 \
--posData=sub-2001_ses-1_acq-PA_dir-107_desc-XcUn_dwi.nii.gz@sub-2001_ses-1_acq-PA_dir-99_desc-XcUn_dwi.nii.gz \
--negData=sub-2001_ses-1_acq-AP_dir-107_desc-XcUn_dwi.nii.gz@sub-2001_ses-1_acq-AP_dir-99_desc-XcUn_dwi.nii.gz \
--echospacing=0.689998 --PEdir=2 --gdcoeffs=NONE \
--extra-eddy-arg=--repol --extra-eddy-arg=--data_is_shelled  --extra-eddy-arg=--verbose
Help message ``` Perform the steps of the HCP Diffusion Preprocessing Pipeline Usage: DiffPreprocPipeline.sh PARAMETER... PARAMETERs are: [ ] = optional; < > = user supplied value [--help] show usage information and exit with a non-zero return code [--version] show version information and exit with 0 as return code --path= path to subject's data folder --subject= subject ID --PEdir= phase encoding direction specifier: 1=LR/RL, 2=AP/PA --posData= @ symbol separated list of data with 'positive' phase encoding direction; e.g., data_RL1@data_RL2@...data_RLn, or data_PA1@data_PA2@...data_PAn --negData= @ symbol separated list of data with 'negative' phase encoding direction; e.g., data_LR1@data_LR2@...data_LRn, or data_AP1@data_AP2@...data_APn --echospacing= Echo spacing in msecs --gdcoeffs= path to file containing coefficients that describe spatial variations of the scanner gradients. Applied *after* 'eddy'. Use --gdcoeffs=NONE if not available. [--dwiname=] name to give DWI output directories. Defaults to Diffusion [--dof=] Degrees of Freedom for post eddy registration to structural images. Defaults to 6 [--b0maxbval=] Volumes with a bvalue smaller than this value will be considered as b0s. Defaults to 50 [--printcom=] Use the specified to echo or otherwise output the commands that would be executed instead of actually running them. --printcom=echo is intended to be used for testing purposes [--select-best-b0] If set selects the best b0 for each phase encoding direction to pass on to topup rather than the default behaviour of using equally spaced b0's throughout the scan. The best b0 is identified as the least distorted (i.e., most similar to the average b0 after registration). [--extra-eddy-arg=] Generic single token (no whitespace) argument to pass to the DiffPreprocPipeline_Eddy.sh script and subsequently to the run_eddy.sh script and finally to the command that actually invokes the eddy binary. The following will work: --extra-eddy-arg=--val=1 because "--val=1" is a single token containing no whitespace. The following will not work: --extra-eddy-arg="--val1=1 --val2=2" because "--val1=1 --val2=2" is NOT a single token. To build a multi-token series of arguments, you can specify this --extra-eddy-arg= parameter several times. e.g., --extra-eddy-arg=--val1=1 --extra-eddy-arg=--val2=2 To get an argument like "-flag value" (where there is no '=' between the flag and the value) passed to the eddy binary, the following sequence will work: --extra-eddy-arg=-flag --extra-eddy-arg=value [--no-gpu] If specified, use the non-GPU-enabled version of eddy. Defaults to using the GPU-enabled version of eddy. [--cuda-version=X.Y] If using the GPU-enabled version of eddy, then this option can be used to specify which eddy_cuda binary version to use. If specified, FSLDIR/bin/eddy_cudaX.Y will be used. [--combine-data-flag=] Specified value is passed as the CombineDataFlag value for the eddy_postproc.sh script. If JAC resampling has been used in eddy, this value determines what to do with the output file. 2 - include in the output all volumes uncombined (i.e. output file of eddy) 1 - include in the output and combine only volumes where both LR/RL (or AP/PA) pairs have been acquired 0 - As 1, but also include uncombined single volumes Defaults to 1 Return Status Value: 0 All parameters were properly formed and processing succeeded, or help requested. Non-zero otherwise - malformed parameters or a processing failure was detected Required Environment Variables: HCPPIPEDIR The home directory for the version of the HCP Pipeline Scripts being used. FSLDIR The home directory for FSL FREESURFER_HOME The home directory for FreeSurfer PATH Standard PATH environment variable must be set to find HCP-customized version of gradient_unwarp.py ```
tashrifbillah commented 3 years ago

Shell script approach

Can we instead write a master shell script that executes: ```bash # for loop defined in issue description ===================== for i in sub-*/ses-*/dwi/*acq-PA_dir-99_dwi.nii.gz sub-*/ses-*/dwi/*acq-PA_dir-107_dwi.nii.gz \ sub-*/ses-*/dwi/*acq-AP_dir-99_dwi.nii.gz sub-*/ses-*/dwi/*acq-AP_dir-107_dwi.nii.gz do luigi-pnlpipe/workflows/ExecuteTask.py --task GibbsUn --bids-data-dir $HOME/HCP/rawdata -c 2001 -s 1 \ --dwi-template $i done # HCP pipeline ===================== HCPpipelines/DiffusionPreprocessing/DiffPreprocPipeline.sh \ --path=$HOME/HCP/derivatives/pnlpipe/sub-2001/ses-1/dwi/ --subject=hcppipe \ --cuda-version=9.1 \ --posData=sub-2001_ses-1_acq-PA_dir-107_desc-XcUn_dwi.nii.gz@sub-2001_ses-1_acq-PA_dir-99_desc-XcUn_dwi.nii.gz \ --negData=sub-2001_ses-1_acq-AP_dir-107_desc-XcUn_dwi.nii.gz@sub-2001_ses-1_acq-AP_dir-99_desc-XcUn_dwi.nii.gz \ --echospacing=0.689998 --PEdir=2 --gdcoeffs=NONE \ --extra-eddy-arg=--repol --extra-eddy-arg=--data_is_shelled --extra-eddy-arg=--verbose # Move output back to `dwi/` directory or create symlinks ===================== ''' Observe the following output files in ${StudyFolder}/${Subject}: dwi: Diffusion/eddy/eddy_unwarped_images.nii.gz bvals: Diffusion/eddy/Pos_Neg.bvals bvecs: Diffusion/eddy/eddy_unwarped_images.eddy_rotated_bvecs mask: Diffusion/eddy/nodif_brain_mask.nii.gz ''' ```

The shell script must take input for however many series (e.g. 2 for AP, 2 for PA) and any additional arguments to be passed to HCP pipeline.

tashrifbillah commented 3 years ago

Luigi approach

For the HCP pipeline to be defined as a node to Luigi pipeline, the following parameters of the former should be opened to the latter:

mandatory (6) ```bash --PEdir= phase encoding direction specifier: 1=LR/RL, 2=AP/PA --echospacing= Echo spacing in msecs [--no-gpu] If specified, use the non-GPU-enabled version of eddy. Defaults to using the GPU-enabled version of eddy. [--cuda-version=X.Y] If using the GPU-enabled version of eddy, then this option can be used to specify which eddy_cuda binary version to use. If specified, FSLDIR/bin/eddy_cudaX.Y will be used. [--combine-data-flag=] Specified value is passed as the CombineDataFlag value for the eddy_postproc.sh script. If JAC resampling has been used in eddy, this value determines what to do with the output file. 2 - include in the output all volumes uncombined (i.e. output file of eddy) 1 - include in the output and combine only volumes where both LR/RL (or AP/PA) pairs have been acquired 0 - As 1, but also include uncombined single volumes Defaults to 1 [--extra-eddy-arg=] Generic single token (no whitespace) argument to pass to the DiffPreprocPipeline_Eddy.sh script and subsequently to the run_eddy.sh script and finally to the command that actually invokes the eddy binary. ```
optional (2) ``` [--b0maxbval=] Volumes with a bvalue smaller than this value will be considered as b0s. Defaults to 50 [--select-best-b0] If set selects the best b0 for each phase encoding direction to pass on to topup rather than the default behaviour of using equally spaced b0's throughout the scan. The best b0 is identified as the least distorted (i.e., most similar to the average b0 after registration). ```

Whether the mandatory or the optional ones are opened, they must be defined both in the task definition and in the configuration file.

In addition, the variables HCPPIPEDIR, HCPPIPEDIR_Config, and HCPPIPEDIR_Config must be defined in the shell.

Task definition should look like: ```python @inherits(GibbsUn) class HcpPipe(Task): posData_template = Parameter(default='') negData_template = Parameter(default='') TopupOutDir= Parameter(default='fsl_eddy') # all other mandatory parameters def requires(self): pos1_template, pos2_template= self.posData.split(',') # acq-PA self.dwi_template= pos_1_template pos_1= self.clone(GibbsUn) self.dwi_template= pos_2_template pos_2= self.clone(GibbsUn) neg1_template, neg2_template= self.posData.split(',') # acq-AP self.dwi_template= neg_1_template neg_1= self.clone(GibbsUn) self.dwi_template= neg_2_template neg_2= self.clone(GibbsUn) return (pos_1, pos_2, neg_1, neg_2) def run(self): outDir = self.output()['dwi'].dirname.join(self.TopupOutDir) for name in ['dwi', 'bval', 'bvec']: if not self.output()[name].exists(): if outDir.exists(): rmtree(outDir) cmd = (' ').join(['DiffPreprocPipeline.sh', '...']) p = Popen(cmd, shell=True) p.wait() break write_provenance(self, self.output()['dwi']) ```

Remember that HCP pipeline launches Python from within Shell script. So if we write a new Luigi node, it will be:

Python-->Shell-->Python

subprocesses! It is probably okay but not tidy.

tashrifbillah commented 3 years ago
The LSF script we recently wrote for its execution: ```bash #!/usr/bin/bash export HCPPIPEDIR=pnlpipe3/HCPpipelines export HCPPIPEDIR_Config=pnlpipe3/HCPpipelines/global/config export HCPPIPEDIR_Global=pnlpipe3/HCPpipelines/global/scripts source pnlpipe3/bashrc3 caselist=caselist.txt #BSUB -J hcp-topup[18] #BSUB -q gpu #BSUB -m ml005 #BSUB -R rusage[mem=12000] #BSUB -o data_processing/output/hcp-topup-%J-%I.out #BSUB -e data_processing/output/hcp-topup-%J-%I.err #BSUB -n 4 echo ${LSB_JOBINDEX} i=`head -${LSB_JOBINDEX} ${caselist} | tail -1` export CUDA_VISIBLE_DEVICES=$(( ${LSB_JOBINDEX}%4 )) datadir=$HOME/HCP/derivatives/pnlpipe/$i/ses-1/dwi/ [ -f $datadir/hcppipe/Diffusion/eddy/eddy_unwarped_images.nii.gz ] && exit || : $HCPPIPEDIR/DiffusionPreprocessing/DiffPreprocPipeline.sh --path=$datadir --subject=hcppipe --cuda-version=9.1 \ --posData=$datadir/${i}_ses-1_acq-PA_dir-107_desc-XcUn_dwi.nii.gz@$datadir/${i}_ses-1_acq-PA_dir-99_desc-XcUn_dwi.nii.gz \ --negData=$datadir/${i}_ses-1_acq-AP_dir-107_desc-XcUn_dwi.nii.gz@$datadir/${i}_ses-1_acq-AP_dir-99_desc-XcUn_dwi.nii.gz \ --echospacing=0.689998 --PEdir=2 --gdcoeffs=NONE \ --extra-eddy-arg=--data_is_shelled --extra-eddy-arg=--repol --extra-eddy-arg=--verbose ```

--path and --subject are used to construct output directory only:

https://github.com/pnlbwh/HCPpipelines/blob/cdf2f3aa8a1916fc9aee3d47360cd417724a87be/DiffusionPreprocessing/DiffPreprocPipeline.sh#L547-L553


The step script I wrote for RAs:

U01_HCP_Psychosis/data_processing/scripts/hcp_pnl_topup_8_25_2021.lsf

tashrifbillah commented 3 years ago

PR #51

tashrifbillah commented 3 years ago

Can the soft links be defined in HCPpipeline and a flag file (e.g. .hcpipe_complete) be written so that downstream tasks can look for that flag file?

Edit: Not a good idea because it would require removal of _acq-AP_ and computation of _dir-{}_ using shell script i.e. all of:

https://github.com/pnlbwh/luigi-pnlpipe/blob/cc012785a53c70bbc711c7ae06248a009766358c/workflows/dwi_pipe.py#L551-L583

tashrifbillah commented 3 years ago

Final shell script: https://github.com/pnlbwh/luigi-pnlpipe/blob/be4d0b4f64399a1da36f4de7652abf45cc172c9d/workflows/hcp_pnl_topup.lsf