nipreps / niworkflows

Common workflows for MRI (anatomical, functional, diffusion, etc)
https://www.nipreps.org/niworkflows
Apache License 2.0
85 stars 51 forks source link

[BUG] Initializing a "nested" workflow result in reuse of pre-initialized one. #700

Closed GalKepler closed 2 years ago

GalKepler commented 2 years ago

Hi, I'm trying to create a custom DWI preprocessing workflow using the template provided in nipreps/dmriprep. I've left the anatomical preprocessing procedure untouched and basically created a diffusion preprocessing workflow that is based on mrtrix and is compatible with my (sophisticated PEPOLAR scheme) diffusion protocol.

So, for each of the participant's available DWI scans, I initialize a workflow using a "init_dwi_preproc_wf" function, same as it is done in nipreps' (although the workflow itself is different):

dwi_preproc_list = []
    for dwi_file in subject_data["dwi"]:
        dwi_preproc_wf = init_dwi_preproc_wf(dwi_file)

The problem is that it seems to be using the subjects' pre-initialized DWI workflow (as my subjects' have more than one session). The working directory's structure looks like this:

work/
    single_subject_<participant_label>_wf/
        about/
        anat_preproc_wf/
        ...
        dwi_preproc_ses_<***first_session_label***>_wf/
            brainextraction_wf/
            ...
        dwi_preproc_ses_<***second_session_label***>_wf/
            ***single_subject_<participant_label>_wf***
                dwi_preproc_ses_<***first_session_label***>_wf/
                    ...

It seems to rebuild the entire subject's workflow inside the pre-initialized DWI one.

Would love your help on this one - I feel like I got a little lost trying to resolve it.

Thanks!

effigies commented 2 years ago

I think the only way to figure out what's going on is to look at init_dwi_preproc_wf. Is this something you can share?

GalKepler commented 2 years ago

@effigies Thank you for the prompt reply! Of course, all of the code is available in my fork of dmriprep, and the specific init_dwi_preproc_wf is located here

Thank you!

effigies commented 2 years ago

So looking at your branch, I see some things that might be leading to this:

https://github.com/GalBenZvi/dmriprep/blob/8ce447223e076236cf94ca0179a65cb8cc85ba20/dmriprep/workflows/base.py#L362-L396

    dwi_preproc_list = []
    for dwi_file in subject_data["dwi"]:
        dwi_preproc_wf = init_dwi_preproc_wf(
            dwi_file,
        )
        workflow.base_dir = f"{config.execution.work_dir}/dmriprep_wf"
        workflow.config["execution"]["crashdump_dir"] = str(
            config.execution.output_dir
            / "dmriprep"
            / f"sub-{subject_id}"
            / "log"
            / config.execution.run_uuid
        )
        # fmt: off
        workflow.connect([
            (anat_preproc_wf, dwi_preproc_wf, [
                ("outputnode.t1w_preproc", "inputnode.t1w_preproc"),
                ("outputnode.t1w_mask", "inputnode.t1w_mask"),
                ("outputnode.t1w_dseg", "inputnode.t1w_dseg"),
                ("outputnode.t1w_aseg", "inputnode.t1w_aseg"),
                ("outputnode.t1w_aparc", "inputnode.t1w_aparc"),
                ("outputnode.t1w_tpms", "inputnode.t1w_tpms"),
                ("outputnode.template", "inputnode.template"),
                ("outputnode.anat2std_xfm", "inputnode.anat2std_xfm"),
                ("outputnode.std2anat_xfm", "inputnode.std2anat_xfm"),
                # Undefined if --fs-no-reconall, but this is safe
                ("outputnode.subjects_dir", "inputnode.subjects_dir"),
                ("outputnode.t1w2fsnative_xfm", "inputnode.t1w2fsnative_xfm"),
                ("outputnode.fsnative2t1w_xfm", "inputnode.fsnative2t1w_xfm"),
            ]),
            (bids_info, dwi_preproc_wf, [("subject", "inputnode.subject_id")]),
        ])
        # fmt: on
        # Keep a handle to each workflow
        workflow.run()

We should not be running each workflow during construction. You want to create sub-workflows, add them to the base workflow, and then run as a single overriding workflow. I suspect this might have been done for debug purposes, but I think a better approach for that would be:

debug_wf = init_single_subject_wf(subject_id)
debug_wf.base_directory = "/path/to/work/dir"
debug_wf.run()

Rather than inserting here. So my guess as to what's happening is that you have multiple workflows that are all targeting the same working directory being run without clearing the directory in between, so you're getting a mixture of working directories sitting on top of each other and looking confusing.

Does that help at all?

GalKepler commented 2 years ago

I've actually tried it before and it resulted in the error I described before. Running the sub-workflows is just a temporary solution so I can collect some preprocessed data, but I'm looking for a more robust solution... I've tried specifying both the main (subject's) workflow's working directory and the sub (session's/scan's) workflow's and it ended up the same. I think it has something to do with initializing a workflow rather than cloning an existing one, but I'm not sure...

GalKepler commented 2 years ago

Hi, Re-visiting this, as I didn't manage to sort this out... I followed @effigies instructions and stopped running single "sub-workflows" seperately, and instead I'm creating these sub-workflows and keeping an handle for them in a list. The sessions still appear to be nested somehow, resulting in a wierd-looking directory structure:

work/
    single_subject_<participant_label>_wf/
        about/
        anat_preproc_wf/
        ...
        dwi_preproc_ses_<***first_session_label***>_wf/
            single_subject_<participant_label>_wf/
                dwi_preproc_ses_<***second_session_label***>_wf/
                    single_subject_<participant_label>_wf/
                        ....

The most troubling thing is that the workflow's graph seems to make sense, but the execution does not: graph

Would very much appreciate your help with this (:

GalKepler commented 2 years ago

Fixed this one. It turns out that nodes are cached during runs, so calling an existing object also links its corresponding edges and properties. So the solution was quite simple - initiate the nodes in each consecutive run.