nipreps / mriqc

Automated Quality Control and visual reports for Quality Assessment of structural (T1w, T2w) and functional MRI of the brain
http://mriqc.readthedocs.io
Apache License 2.0
294 stars 130 forks source link

concurrent.futures error #1275

Closed madeleinebarter closed 1 month ago

madeleinebarter commented 5 months ago

What happened?

I ran mriqc using the docker version on one participant and received the error code: exception calling callback for <Future at 0x2aaac0288e50 state=finished raised BrokenProcessPool>

I am running the latest docker version on a Mac Studio with an M2 chip and macOS Sonoma

What command did you use?

docker run --platform linux/amd64 -it --rm -v /Volumes/clippdata/IDP/bids2:/data:ro -v /Volumes/clippdata/IDP/mriqc:/out nipreps/mriqc:latest /data /out participant --participant_label 001

What version of the software are you running?

24.1.0

How are you running this software?

Docker

Is your data BIDS valid?

Yes

Are you reusing any previously computed results?

No

Please copy and paste any relevant log output.

No response

Additional information / screenshots

Screenshot 2024-04-15 at 1 53 03 PM
oesteban commented 5 months ago

Hi @madeleinebarter, this looks like a memory error. Although I'm actively working on making this more reliable, in the case of having many subjects you may want to play a little with the options --omp-nthreads and --nprocs.

Can you elaborate more on the characteristics of your images (number of images, modalities, etc.) and your machine (number of cpus, memory available, etc.)?

It would be also helpful to run mriqc a bit more verbose (e.g., set -vv or -vvv).

Finally, copying and pasting the last config file generated at <work-directory>/config-<timestamp>-<unique-id>.toml would also be of help.

Armos05 commented 4 months ago

Hey @oesteban , I also have had a similar error for some time now, " ERROR | concurrent.futures | exception calling callback for <Future at 0x7f622a65b8d0 state=finished raised BrokenProcessPool>"

I also tried implementing some solutions to get around this problem.

System Information: Running MRIQC version 24.1.0, on Linux (8GB) and MAC OS(16 GB), latest Docker Version, platforms, but got the same error in both cases.

I have attached the log files (linux and MAC OS) for your reference. crash-20240507-073505-root-UploadMetrics.a0-4ce7c5f2-2f4c-4dc8-a8d3-b684affa8fd9.txt mriqc-20240507-083540_d7ea3439-f37f-449c-9158-9ae617df4d5f.log MACOS_mriqc-20240507-081646_441bbab0-acb9-43c4-9304-c6a9e99cff5a.log

Thanks in advance

oesteban commented 4 months ago

@Armos05 - by looking at your crashfile, it seems our server was down when you ran MRIQC.

Can you try again with the same arguments? If it still crashes, can you run MRIQC while adding the argument --no-sub?

Armos05 commented 4 months ago

@oesteban, Thanks for looking into it. MRIQC works now!!

Although, I was unable to perform group analysis, MRIQC can calculate all the parameters into the json file but there are no visual renders of those values into the graphs to look at in the final HTML file.

Here the config file: [environment] cpu_count = 8 exec_env = "singularity" free_mem = 13.2 freesurfer_home = "/opt/freesurfer" overcommit_policy = "heuristic" overcommit_limit = "50%" nipype_version = "1.8.6" synthstrip_path = "PosixPath('/opt/freesurfer/models/synthstrip.1.pt')" templateflow_version = "24.2.0" total_memory = 15.365371704101562 version = "24.1.0.dev0+g3fe90466.d20240417"

[execution] ants_float = false bids_dir = "/data" bids_dir_datalad = false bids_database_dir = "/out/.bids_db" bids_database_wipe = false cwd = "/tmp" datalad_get = true debug = false dry_run = false dsname = "" float32 = true layout = "BIDS Layout: /data" log_dir = "/out/logs" log_level = 25 modalities = [ "T1w", "T2w", "bold", "dwi",] no_sub = true notrack = false output_dir = "/out" participant_label = [ "01", "32", "33",] pdb = false reports_only = false resource_monitor = false run_uuid = "20240521-083912_05591ae9-5501-48d5-9296-462c064a37a7" templateflow_home = "/templateflow" upload_strict = false verbose_reports = false webapi_url = "https://mriqc.nimh.nih.gov:443/api/v1" work_dir = "/tmp/work" write_graph = false

[workflow] analysis_level = [ "group",] biggest_file_gb = 1.1229652797928313e-10 deoblique = false despike = false fd_thres = 0.2 fd_radius = 50 fft_spikes_detector = false min_len_dwi = 7 min_len_bold = 5 species = "human" template_id = "MNI152NLin2009cAsym"

[nipype] crashfile_format = "txt" get_linked_libs = false local_hash_check = true nprocs = 2 omp_nthreads = 2 plugin = "MultiProc" remove_node_directories = false resource_monitor = false stop_on_first_crash = true

[settings] file_path = "/out/logs/config-20240521-083912_05591ae9-5501-48d5-9296-462c064a37a7.toml" start_time = 1716280752.2356057

[execution.bids_filters]

[workflow.inputs] t1w = [ "/data/sub-01/anat/sub-01_T1w.nii.gz", "/data/sub-32/anat/sub-32_T1w.nii.gz", "/data/sub-33/anat/sub-33_T1w.nii.gz",] bold = [ "/data/sub-01/func/sub-01_task-rest_bold.nii.gz", "/data/sub-32/func/sub-32_task-rest_bold.nii.gz", "/data/sub-33/func/sub-33_task-rest_bold.nii.gz",]

[nipype.plugin_args] maxtasksperchild = 1 raise_insufficient = false

suxpert commented 2 months ago

Can you try again with the same arguments? If it still crashes, can you run MRIQC while adding the argument --no-sub?

I got the exact same error even if I run with --no-sub. Is mriqc trying to fetch something from the server which can not be turned off? e.g., templateflow?

BTW, I run mriqc from docker without --mem at first, mriqc ate all free memory and swap, made my computer frozen. My environment is openSUSE Tumbleweed, with nipreps/mriqc:24.0.0, which report 24.1.0.dev0 in its output.

suxpert commented 2 months ago

For my latest try, I limited the memory with --mem 80 (the workstation has 128G), mriqc still used all memory resources during some processing (without any output after the 'Generating visual report' info and warning about 'background was too small', before the crash, just like in OP's screenshot). This time the workstation survived after mriqc crashed, fortunately.

It seems that version 24.1.0.dev0 do not obey the memory limits? Or I misunderstood this option?

hugofluhr commented 2 months ago

I get the same problem running MRIQC 24.0 via singularity on a HPC cluster. v23.1 works fine with the exact same command/options.

oesteban commented 1 month ago

@suxpert @hugofluhr this bug report is too general and has branched into several different issues so it is really difficult to identify what's the target.

May I ask you to file fresh new issues for the problems you are experiencing? It'd be great to get access to your logs and crashfiles.

Closing this one for the time being.