First pass and correct T2w segmentations

jcohenadad commented 5 months ago

@Kaonashi22 While waiting for the vertebral labeling #9 I suggest to:

run the pipeline on all subjects
visually QC the segmentation of the T2w
manually correct the segmentation when needed using https://github.com/spinalcordtoolbox/manual-correction. Manual correction should only be done for levels that will be used for morphometric calculations (C2-T5 if I remember correctly)
save the manually corrected segmentations under the derivatives/labels folder inside the source database. Alternatively, we could also upload the manual corrections as part of a release on this repository (to make sure to track the modifications, to make sure it is not lost, and to facilitate collaboration-- eg: for me to try running the pipeline with your manual corrections).

Kaonashi22 commented 5 months ago

Thanks, I will do you that!

From: Julien Cohen-Adad @.> Sent: March 14, 2024 16:01 To: sct-pipeline/spine-park @.> Cc: Lydia Chougar, Dr @.>; Mention @.> Subject: [sct-pipeline/spine-park] First pass and correct T2w segmentations (Issue #15)

@Kaonashi22https://github.com/Kaonashi22 While waiting for the vertebral labeling #9https://github.com/sct-pipeline/spine-park/issues/9 I suggest to:

run the pipeline on all subjects
visually QC the segmentation of the T2w
manually correct the segmentation when needed using https://github.com/spinalcordtoolbox/manual-correction. Manual correction should only be done for levels that will be used for morphometric calculations (C2-T5 if I remember correctly)
save the manually corrected segmentations under the derivatives/labels folder inside the source database. Alternatively, we could also upload the manual corrections as part of a release on this repository (to make sure to track the modifications, to make sure it is not lost, and to facilitate collaboration-- eg: for me to try running the pipeline with your manual corrections).

— Reply to this email directly, view it on GitHubhttps://github.com/sct-pipeline/spine-park/issues/15, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BFCFJYTCAMX6KAHIMNECBSLYYH6ZDAVCNFSM6AAAAABEWB5SPCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE4DMMZSGUZDKMQ. You are receiving this because you were mentioned.Message ID: @.***>

jcohenadad commented 5 months ago

@Kaonashi22 let me know when you have corrected 2-3 segmentations. At this point, you can send me the derivatives folder, and I can modify the processing script so that it accounts for these manual corrections. That way we can work in parallel (ie: you correcting the seg, and me updating the analysis script)

Kaonashi22 commented 5 months ago

Thanks, I will send you the derivative folders soon.

From: Julien Cohen-Adad @.> Sent: March 19, 2024 09:52 To: sct-pipeline/spine-park @.> Cc: Lydia Chougar, Dr @.>; Mention @.> Subject: Re: [sct-pipeline/spine-park] First pass and correct T2w segmentations (Issue #15)

@Kaonashi22https://github.com/Kaonashi22 let me know when you have corrected 2-3 segmentations. At this point, you can send me the derivatives folder, and I can modify the processing script so that it accounts for these manual corrections. That way we can work in parallel (ie: you correcting the seg, and me updating the analysis script)

— Reply to this email directly, view it on GitHubhttps://github.com/sct-pipeline/spine-park/issues/15#issuecomment-2007238183, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BFCFJYQSOF47U7ZNILF4ZO3YZA7JPAVCNFSM6AAAAABEWB5SPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBXGIZTQMJYGM. You are receiving this because you were mentioned.Message ID: @.***>

Kaonashi22 commented 5 months ago

Overall, the segmentations of T2-w images are accurate. Sometimes, two or three voxels are missing on some slices; how precise the segmentation should be? Is it worth correcting manually these masks? Also, the lowest slice of the T2 volume usually has a poor signal; would you recommend discarding the segmentation at this level?

Kaonashi22 commented 5 months ago

I attached the derivatives folder with the masks manually corrected.

After running the pipeline on all subjects, the script exited with errors at different steps after the segmentation of T2w images; here are some log files. err.batch_processing_sub-CG176.log err.batch_processing_sub-DEV148Sujet01.log err.batch_processing_sub-DEV203Sujet08.log err.batch_processing_sub-RC194.log

derivatives.zip

jcohenadad commented 5 months ago

Overall, the segmentations of T2-w images are accurate. Sometimes, two or three voxels are missing on some slices; how precise the segmentation should be? Is it worth correcting manually these masks?

It depends how precise you want the results to be. For example, if a slice is missing 2-3 pixels, and you average the CSA across, let's say, 20 slices, assuming a CSA of 80 mm2 at a resolution of 0.8mm2, 3 pixels correspond to 1.92 mm2, which represents 2.4% of the CSA computed on a single slice, or 0.14% of the CSA computed over 20 slices (so quite negligible, I would say).

Also, the lowest slice of the T2 volume usually has a poor signal; would you recommend discarding the segmentation at this level?

It depends how important the lowest slice is for your analysis (eg: are you considering it? if not, not important)

jcohenadad commented 5 months ago

About the issues:

err.batch_processing_sub-CG176.log: error during registration of T2w to template. Could you send me this subjects?
err.batch_processing_sub-DEV148Sujet01.log: The script is looking for the file warp_template2T1w.nii.gz, but it does not exist. Strangely, the latest version of the script is looking for a different file name (../anat/warp_template2anat.nii.gz): https://github.com/sct-pipeline/spine-park/blob/8e541e9064c9be264a85f867ed93371df80c506b/batch_processing.sh#L253-L259 What version of the script did you run? If you send me this subject I can also try to reproduce.
err.batch_processing_sub-DEV203Sujet08.log: Error generated during QC report (portalocker.exceptions.LockException: [Errno 5] Input/output error). I am wondering if this is related to the parallel writing (due to multiple jobs) on a file on a mounted disk. Could you send me this image as well so I can try to reproduce? Tagging @joshuacwnewton so he is aware
err.batch_processing_sub-RC194.log: We get an OSError: [Errno 9] Bad file descriptor. I'm not sure how to interpret this. Could you also send me this subjects?

joshuacwnewton commented 5 months ago

err.batch_processing_sub-DEV203Sujet08.log: Error generated during QC report (portalocker.exceptions.LockException: [Errno 5] Input/output error). I am wondering if this is related to the parallel writing (due to multiple jobs) on a file on a mounted disk. Could you send me this image as well so I can try to reproduce? Tagging @joshuacwnewton so he is aware

Full traceback for context

``` Traceback (most recent call last): File "/export02/data/lydiac/spinalcordtoolbox/spinalcordtoolbox/scripts/sct_deepseg_sc.py", line 211, in main(sys.argv[1:]) File "/export02/data/lydiac/spinalcordtoolbox/spinalcordtoolbox/scripts/sct_deepseg_sc.py", line 204, in main generate_qc(fname_image, fname_seg=fname_seg, args=argv, path_qc=os.path.abspath(path_qc), File "/export02/data/lydiac/spinalcordtoolbox/spinalcordtoolbox/reports/qc.py", line 715, in generate_qc QcImage( File "/export02/data/lydiac/spinalcordtoolbox/spinalcordtoolbox/reports/qc.py", line 242, in layout self._make_QC_image_for_3d_volumes(img, mask, plane=self.qc_report.plane) File "/export02/data/lydiac/spinalcordtoolbox/spinalcordtoolbox/reports/qc.py", line 290, in _make_QC_image_for_3d_volumes self.qc_report.update_description_file() File "/export02/data/lydiac/spinalcordtoolbox/spinalcordtoolbox/reports/qc.py", line 529, in update_description_file portalocker.lock(dest_file, portalocker.LOCK_EX) File "/export02/data/lydiac/spinalcordtoolbox/python/envs/venv_sct/lib/python3.9/site-packages/portalocker/portalocker.py", line 111, in lock raise exceptions.LockException(exc_value, fh=file_) from exc_value portalocker.exceptions.LockException: [Errno 5] Input/output error ```

err.batch_processing_sub-RC194.log: We get an OSError: [Errno 9] Bad file descriptor. I'm not sure how to interpret this. Could you also send me this subjects?

Full traceback for context

``` Traceback (most recent call last): File "/export02/data/lydiac/spinalcordtoolbox/spinalcordtoolbox/scripts/sct_deepseg_sc.py", line 211, in main(sys.argv[1:]) File "/export02/data/lydiac/spinalcordtoolbox/spinalcordtoolbox/scripts/sct_deepseg_sc.py", line 204, in main generate_qc(fname_image, fname_seg=fname_seg, args=argv, path_qc=os.path.abspath(path_qc), File "/export02/data/lydiac/spinalcordtoolbox/spinalcordtoolbox/reports/qc.py", line 715, in generate_qc QcImage( File "/export02/data/lydiac/spinalcordtoolbox/spinalcordtoolbox/reports/qc.py", line 242, in layout self._make_QC_image_for_3d_volumes(img, mask, plane=self.qc_report.plane) File "/export02/data/lydiac/spinalcordtoolbox/spinalcordtoolbox/reports/qc.py", line 290, in _make_QC_image_for_3d_volumes self.qc_report.update_description_file() File "/export02/data/lydiac/spinalcordtoolbox/spinalcordtoolbox/reports/qc.py", line 564, in update_description_file dest_file.truncate() OSError: [Errno 9] Bad file descriptor ```

Both tracebacks explicitly mention the dest_file file object (which we lock using portalocker, and which points to index.html of the QC report). Also, both errors ("[Errno 5]", "[Errno 9]") are OS system error codes. So, I definitely think it's a safe conclusion to say that this is related to file locking + parallel processing + the mounted file system.

I believe it's errors such as this that lead us to recommend not performing processing on mounted drives.

See, for example:

That said, I could also see this being an intermittent error, so I think it would be worth trying to run the script again and see if the errors are consistently thrown.

jcohenadad commented 5 months ago

I believe it's errors such as this that lead us to recommend not performing processing on mounted drives.

The problem is that, in many labs, data are located on a mounted drive and researchers are imposed to process their data that way. So I'm wondering if there is a workaround for this. I guess using -j 1 would be a workaround, but it would increase processing time by a lot.

joshuacwnewton commented 5 months ago

So I'm wondering if there is a workaround for this.

I think it should be possible to set the QC location somewhere off the mounted drive, while letting the rest of processing happen within the mounted drive? (This is probably worth doing anyway as a test to see if we can isolate the issue.)

I think a similar conclusion was reached by @mguaypaq in this comment:

The problems are showing up because the output directory is a mounted shared drive from duke. But it's generally a bad idea to write results directly to duke, and the proper workflow is to simply use a local directory for the output. (The input can still be directly taken from duke, though.)

jcohenadad commented 5 months ago

I think it should be possible to set the QC location somewhere off the mounted drive, while letting the rest of processing happen within the mounted drive?

YES! Brilliant idea! the same way we do it with our temp folders for processing intermediate NIfTI files.

Although one downside: I often look at the QC while processing is ongoing (if I see something weird I stop the processing)-- so having the QC in a temp file might be more cumbersome to access

joshuacwnewton commented 5 months ago

I've summarized our QC-related discussion in a new issue on the SCT repo (https://github.com/spinalcordtoolbox/spinalcordtoolbox/issues/4423), just so that it doesn't distract from the spine-park related discussion in this issue. :)

Kaonashi22 commented 5 months ago

Thanks @jcohenadad, I'll send you the images by email For sub-DEV148Sujet01, I was indeed using on older version of the script (https://github.com/sct-pipeline/spine-park/commit/1b378e0d9ae743b8142716991956e1598dff5dd6)

Kaonashi22 commented 5 months ago

I'm running the script from a mounted drive (where the SCT and the current script are saved), but the input and output folders are located on a server. Does it have any influence?

From: Julien Cohen-Adad @.> Sent: April 4, 2024 22:21 To: sct-pipeline/spine-park @.> Cc: Lydia Chougar, Dr @.>; Mention @.> Subject: Re: [sct-pipeline/spine-park] First pass and correct T2w segmentations (Issue #15)

I believe it's errors such as this that lead us to recommend not performing processing on mounted drives.

The problem is that, in many labs, data are located on a mounted drive and researchers are imposed to process their data that way. So I'm wondering if there is a workaround for this. I guess using -j 1 would be a workaround, but it would increase processing time by a lot.

— Reply to this email directly, view it on GitHubhttps://github.com/sct-pipeline/spine-park/issues/15#issuecomment-2038640090, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BFCFJYUQAHOIGRYOUZC6I5TY3YDDLAVCNFSM6AAAAABEWB5SPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZYGY2DAMBZGA. You are receiving this because you were mentioned.Message ID: @.***>

jcohenadad commented 5 months ago

@Kaonashi22 I've ran the script version 78129ca6e3fe90f39b20af565a5e1918d2f6754e on the data you sent me (incl. those that produced the errors https://github.com/sct-pipeline/spine-park/issues/15#issuecomment-2038269172) and I did not observe any error.

Terminal output

```console julien-macbook:~/temp/Lydia/results_20240405_125118/log $ ls -1 batch_processing_sub-BB277.log batch_processing_sub-BJ170.log batch_processing_sub-CG176.log batch_processing_sub-DEV148Sujet01.log batch_processing_sub-DEV203Sujet08.log batch_processing_sub-DEV206Sujet10.log batch_processing_sub-LC164.log batch_processing_sub-LM166.log batch_processing_sub-RC194.log ```

Therefore, I suspect the issues you've observed were caused by parallel writing on the locked QC file. One way to overcome this is to use the flag -jobs 1. Can you please try to see if it solves the issue?

Kaonashi22 commented 5 months ago

Sounds good, thank you. I'm not sure I see the script version https://github.com/sct-pipeline/spine-park/commit/78129ca6e3fe90f39b20af565a5e1918d2f6754e 78129ca.Is it the file posted here: https://github.com/sct-pipeline/spine-park/commit/78129ca6e3fe90f39b20af565a5e1918d2f6754e https://github.com/sct-pipeline/spine-park/pull/8/files?https://github.com/sct-pipeline/spine-park/commit/78129ca6e3fe90f39b20af565a5e1918d2f6754e

From: Julien Cohen-Adad @.> Sent: April 6, 2024 15:37 To: sct-pipeline/spine-park @.> Cc: Lydia Chougar, Dr @.>; Mention @.> Subject: Re: [sct-pipeline/spine-park] First pass and correct T2w segmentations (Issue #15)

@Kaonashi22https://github.com/Kaonashi22 I've ran the script version 78129cahttps://github.com/sct-pipeline/spine-park/commit/78129ca6e3fe90f39b20af565a5e1918d2f6754e on the data you sent me (incl. those that produced the errors #15 (comment)https://github.com/sct-pipeline/spine-park/issues/15#issuecomment-2038269172) and I did not observe any error.

Terminal output

julien-macbook:~/temp/Lydia/results_20240405_125118/log $ ls -1 batch_processing_sub-BB277.log batch_processing_sub-BJ170.log batch_processing_sub-CG176.log batch_processing_sub-DEV148Sujet01.log batch_processing_sub-DEV203Sujet08.log batch_processing_sub-DEV206Sujet10.log batch_processing_sub-LC164.log batch_processing_sub-LM166.log batch_processing_sub-RC194.log

Therefore, I suspect the issues you've observed were caused by parallel writing on the locked QC file. One way to overcome this is to use the flag -jobs 1. Can you please try to see if it solves the issue?

— Reply to this email directly, view it on GitHubhttps://github.com/sct-pipeline/spine-park/issues/15#issuecomment-2041175816, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BFCFJYWFCW4BBLRUUZADSVLY4BFIRAVCNFSM6AAAAABEWB5SPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBRGE3TKOBRGY. You are receiving this because you were mentioned.Message ID: @.***>

jcohenadad commented 5 months ago

The number refers to a git SHA that points to a state of the repository. Here is the link to that version of the repository, that includes the file: https://github.com/sct-pipeline/spine-park/tree/78129ca6e3fe90f39b20af565a5e1918d2f6754e

But what you should do instead of manually downloading the file from this repository, is go to your local repository and run:

git pull

Then you can make sure you are running the proper version by running:

git log --pretty=oneline

which indicates the version (top item in the list):

julien-macbook:~/code/spine-park $ git log --pretty=oneline
78129ca6e3fe90f39b20af565a5e1918d2f6754e (HEAD -> main, origin/main, origin/HEAD) Added .gitignore  <--- THIS ONE
9bfb14077c49237b83535a326357f81d7b9a57de Sort DWI chunks from top to bottom (#18)
8e541e9064c9be264a85f867ed93371df80c506b (jca/17-manual-corr) Create analysis script (#8)
ca8104e9f430e1f7e47605aabae21c1c86790308 Update README.md
2c078e6887ad5a3be9eff721e405269e45d728bb Added doc to convert to BIDS
183e7b90ae59f84e715ae159f26c001469adb709 Refactored zip_and_move_file()
eb9998ba7cc1ffb18f2cbfd0c3f65f74ee86657d Cleanup
5d2a8d06b356378de590b20a1d619950d86c0fc9 Convert to function with input arguments
7ab56392eed3d2789e9ab046b301d93381905ebc Added printouts
b395dd28d215490cb26f617cd25591687c16fd24 Cleanup, added docstrings
44936815a372514e5b587070614feaf07d6219b4 Create directory inside the zip_and_move_nifti() function
96bb57438fe0f1519e12c67ace7af74fbccd2aff Put back .gz extension on output file name
be206ec4d5dcbe11949d2414a734b3b34224bb64 Fixed duplicated 'sub-' prefix, removed .gz
a332bb5c40e905de181cd3b3ab256e434f2580c3 Added printout for ignored file
85202a6f64775c18845c8757033f1fc7090b1671 Added docstrings, added output path
ea22e2539ec86bcde6dee6ea18b4214b817861f5 Pushed first prototype that parses subject directory
0d348fc71c7fca496d8d0bd1756bf85bb3f6eee4 Initial commit

Kaonashi22 commented 5 months ago

Thank you! I'll let you know how it works

From: Julien Cohen-Adad @.> Sent: April 8, 2024 09:28 To: sct-pipeline/spine-park @.> Cc: Lydia Chougar, Dr @.>; Mention @.> Subject: Re: [sct-pipeline/spine-park] First pass and correct T2w segmentations (Issue #15)

The number refers to a git SHA that points to a state of the repository. Here is the link to that version of the repository, that includes the file: https://github.com/sct-pipeline/spine-park/tree/78129ca6e3fe90f39b20af565a5e1918d2f6754e

But what you should do instead of manually downloading the file from this repository, is go to your local repository and run:

git pull

Then you can make sure you are running the proper version by running:

git log --pretty=oneline

which indicates the version (top item in the list):

julien-macbook:~/code/spine-park $ git log --pretty=oneline 78129ca6e3fe90f39b20af565a5e1918d2f6754e (HEAD -> main, origin/main, origin/HEAD) Added .gitignore <--- THIS ONE 9bfb14077c49237b83535a326357f81d7b9a57de Sort DWI chunks from top to bottom (#18) 8e541e9064c9be264a85f867ed93371df80c506b (jca/17-manual-corr) Create analysis script (#8) ca8104e9f430e1f7e47605aabae21c1c86790308 Update README.md 2c078e6887ad5a3be9eff721e405269e45d728bb Added doc to convert to BIDS 183e7b90ae59f84e715ae159f26c001469adb709 Refactored zip_and_move_file() eb9998ba7cc1ffb18f2cbfd0c3f65f74ee86657d Cleanup 5d2a8d06b356378de590b20a1d619950d86c0fc9 Convert to function with input arguments 7ab56392eed3d2789e9ab046b301d93381905ebc Added printouts b395dd28d215490cb26f617cd25591687c16fd24 Cleanup, added docstrings 44936815a372514e5b587070614feaf07d6219b4 Create directory inside the zip_and_move_nifti() function 96bb57438fe0f1519e12c67ace7af74fbccd2aff Put back .gz extension on output file name be206ec4d5dcbe11949d2414a734b3b34224bb64 Fixed duplicated 'sub-' prefix, removed .gz a332bb5c40e905de181cd3b3ab256e434f2580c3 Added printout for ignored file 85202a6f64775c18845c8757033f1fc7090b1671 Added docstrings, added output path ea22e2539ec86bcde6dee6ea18b4214b817861f5 Pushed first prototype that parses subject directory 0d348fc71c7fca496d8d0bd1756bf85bb3f6eee4 Initial commit

— Reply to this email directly, view it on GitHubhttps://github.com/sct-pipeline/spine-park/issues/15#issuecomment-2042761098, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BFCFJYUFI7HJXJZC6MSI7D3Y4KLQDAVCNFSM6AAAAABEWB5SPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBSG43DCMBZHA. You are receiving this because you were mentioned.Message ID: @.***>

Kaonashi22 commented 4 months ago

I ran the pipeline on all the subjects. The processing time is quite longer (~2h30 for 15 subjects). I split the subjects into groups of 15 to make it smoother. I also had storage space issues, which didn't help...

Here are some comments: -When I re-run the analysis, all the segmentation files are overwritten. I'm not sure if the "segment_if_does_not_exist "${file_t2}" "t2"" is working properly -I got errors for some subjects; the log files are attached -At the end of the processing, the error.log file indicates that the "_T2_seg.nii.gz" and "mtr.nii.gz" don't exit. I guess the path to the "anat" folder is missing -The diffusion maps (FA, MD,...) are only generated for one DWI chunk. -Regarding vertebral labeling before registration to template, should we call the SPINEPS function from the script?

error.log err.batch_processing_sub-LD214.log err.batch_processing_sub-ER240.log err.batch_processing_sub-GE200.log

err.batch_processing_sub-GB300.log

joshuacwnewton commented 4 months ago

Oh dear! In that case, I can try to prioritize the SCT issue in order to avoid - jobs 1 and reduce the processing time.

jcohenadad commented 4 months ago

The processing time is quite longer (~2h30 for 15 subjects). I split the subjects into groups of 15 to make it smoother.

Sorry about that. We'll work on a fix which hopefully will solve the underlying issue

When I re-run the analysis, all the segmentation files are overwritten. I'm not sure if the "segment_if_does_not_exist "${file_t2}" "t2"" is working properly

It should not be the case, as per: https://github.com/sct-pipeline/spine-park/blob/8e541e9064c9be264a85f867ed93371df80c506b/batch_processing.sh#L78-L79

Can you please give an example of the full path where the segmentation that is being overwritten is located originally? I suspect that you did not locate the manual segmentation in the right folder (it works on my end https://github.com/sct-pipeline/spine-park/issues/17)

At the end of the processing, the error.log file indicates that the "_T2_seg.nii.gz" and "mtr.nii.gz" don't exit. I guess the path to the "anat" folder is missing

Good catch! I've opened an issue: https://github.com/sct-pipeline/spine-park/issues/19

The diffusion maps (FA, MD,...) are only generated for one DWI chunk.

Ah! This is because the name for DTI metrics is generic (eg: dti_FA.nii.gz), so it gets overwritten. My bad. I've opened an issue, it is easy to fix that. https://github.com/sct-pipeline/spine-park/issues/20

Regarding vertebral labeling before registration to template, should we call the SPINEPS function from the script?

Yes, we should. Issue opened https://github.com/sct-pipeline/spine-park/issues/21

err.batch_processing_sub-LD214.log err.batch_processing_sub-GB300.log

Issue opened on SCT: https://github.com/spinalcordtoolbox/spinalcordtoolbox/issues/4431 Can you please share these subjects with me?

err.batch_processing_sub-ER240.log

The log file says "could the file be damaged?". I'm wondering if this isn't a disk issue? You mentioned you had such issues.

err.batch_processing_sub-GE200.log

Possibly caused by https://github.com/sct-pipeline/spine-park/issues/22. I'll fix it.

Kaonashi22 commented 4 months ago

I answered to the segmentation files being overwritten issue in https://github.com/sct-pipeline/spine-park/issues/17)

Sending you the subject images by email. Thanks!

joshuacwnewton commented 4 months ago

err.batch_processing_sub-LD214.log err.batch_processing_sub-GB300.log

Issue opened on SCT: spinalcordtoolbox/spinalcordtoolbox#4431

Note: These two subjects that @jcohenadad has highlighted have different errors:

LD214: OSError: [Errno 5] Input/output error
GB300: itk::ERROR: [...] failed to read file

``` export02/data/lydiac/spinalcordtoolbox/bin/isct_ComposeMultiTransform 2 warp2d_00390Warp.nii.gz -R dest_Z0039.nii warp2d_null0Warp.nii.gz warp2d_00390GenericAffine.mat # in /tmp/sct_2024-04-10_08-42-18_register-slicewise_nx0liske Exception occurred. output_image_filename: warp2d_00390Warp.nii.gz reference_image_filename: dest_Z0039.nii [0/2]: FIELD: warp2d_null0Warp.nii.gz [1/2]: AFFINE: warp2d_00390GenericAffine.mat terminate called after throwing an instance of 'itk::ExceptionObject' what(): /__w/build_ANTs/build_ANTs/antsbin/ITKv5/Modules/IO/TransformBase/src/itkTransformFileReader.cxx:144: itk::ERROR: TransformFileReaderTemplate(0x22dfc60): Transform IO: MatlabTransformIOTemplate failed to read file: warp2d_00390GenericAffine.mat ```

Given that the GB300 error occurred A) inside isct_ComposeMultiTransform and B) while trying to read the file (rather than write files), I'm wondering if this is indeed another disk issue, as @jcohenadad suspected.

Kaonashi22 commented 4 months ago

Then, I'll rerun the processing in a folder with more space. Though I had the error "no storage space left" in some other subjects, which was more straightforward.

joshuacwnewton commented 4 months ago

@Kaonashi22 Do you know if any of the filesystems you're working with (e.g. /export02/data/lydiac/, /dagher/dagher11/lydia11/) are "NFS mounts" specifically? You can check this by running:

mount | grep nfs

If so, this might be the key to the QC locking issues: https://github.com/spinalcordtoolbox/spinalcordtoolbox/issues/4423#issuecomment-2048258460

Kaonashi22 commented 4 months ago

The command returns this: nfsd on /proc/fs/nfsd type nfsd (rw,relatime) tubal:/home on /home/bic type nfs4 (rw,nodev,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.68.1.181,local_lock=none,addr=132.206.201.20) nfs.isi.bic.mni.mcgill.ca:/dagher12 on /dagher/dagher12 type nfs4 (rw,nodev,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.68.1.181,local_lock=none,addr=132.216.133.23) nfs.isi.bic.mni.mcgill.ca:/dagher11 on /dagher/dagher11 type nfs4 (rw,nodev,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.68.1.181,local_lock=none,addr=132.216.133.5)

From: Joshua Newton @.> Sent: April 10, 2024 15:16 To: sct-pipeline/spine-park @.> Cc: Lydia Chougar, Dr @.>; Mention @.> Subject: Re: [sct-pipeline/spine-park] First pass and correct T2w segmentations (Issue #15)

@Kaonashi22https://github.com/Kaonashi22 Do you know if any of the filesystems you're working with (e.g. /export02/data/lydiac/, /dagher/dagher11/lydia11/) are "NFS mountshttps://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/storage_administration_guide/ch-nfs" specifically? You can check this by running:

mount | grep nfs

If so, this might be the key to the QC locking issues: spinalcordtoolbox/spinalcordtoolbox#4423 (comment)https://github.com/spinalcordtoolbox/spinalcordtoolbox/issues/4423#issuecomment-2048258460

— Reply to this email directly, view it on GitHubhttps://github.com/sct-pipeline/spine-park/issues/15#issuecomment-2048271215, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BFCFJYSJ2CAOXHEZ6DFPQHLY4WF2BAVCNFSM6AAAAABEWB5SPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBYGI3TCMRRGU. You are receiving this because you were mentioned.Message ID: @.***>

joshuacwnewton commented 4 months ago

Aha! I think that explains it then! (/dagher/dagher11 -- where the data lives and where we perform locking for the QC index file -- is an NFS mount.)

I'll take a look at trying portalocker's proposed fix for handling NFS mounted drives. :)

Kaonashi22 commented 4 months ago

Sounds good, thanks!

From: Joshua Newton @.> Sent: April 10, 2024 19:58 To: sct-pipeline/spine-park @.> Cc: Lydia Chougar, Dr @.>; Mention @.> Subject: Re: [sct-pipeline/spine-park] First pass and correct T2w segmentations (Issue #15)

Aha! I think that explains it then! (/dagher/dagher11 -- where the data lives and where we perform locking for the QC index file -- is an NFS mount.)

I'll take a look at trying portalocker's proposed fix for handling NFS mounted drives. :)

— Reply to this email directly, view it on GitHubhttps://github.com/sct-pipeline/spine-park/issues/15#issuecomment-2048619216, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BFCFJYWXT5GF3LPMRT6ACRTY4XG2DAVCNFSM6AAAAABEWB5SPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBYGYYTSMRRGY. You are receiving this because you were mentioned.Message ID: @.***>

jcohenadad commented 4 months ago

All 4 subjects ran without error, so I suspect the issue was related to disk space and/or NFS portalocker issue

Terminal output

```console Processing 4 subjects in parallel. (Worker processes used: 4). Started at 11h44m21s: sub-ER240. See log file /Users/julien/temp/Lydia/results_20240411_114414/log/batch_processing_sub-ER240.log Started at 11h44m21s: sub-GB300. See log file /Users/julien/temp/Lydia/results_20240411_114414/log/batch_processing_sub-GB300.log Started at 11h44m21s: sub-GE200. See log file /Users/julien/temp/Lydia/results_20240411_114414/log/batch_processing_sub-GE200.log Started at 11h44m21s: sub-LD214. See log file /Users/julien/temp/Lydia/results_20240411_114414/log/batch_processing_sub-LD214.log Hooray! your batch completed successfully :-) Started: 2024-04-11 11h44m21s | Ended: 12h57m38s | Duration: 01h13m16s ```

Kaonashi22 commented 4 months ago

Thanks for this trial. I'll make sure I have enough storage space for the next run.

From: Julien Cohen-Adad @.> Sent: April 11, 2024 13:41 To: sct-pipeline/spine-park @.> Cc: Lydia Chougar, Dr @.>; Mention @.> Subject: Re: [sct-pipeline/spine-park] First pass and correct T2w segmentations (Issue #15)

All 4 subjects ran without error, so I suspect the issue was related to disk space

Terminal output

Processing 4 subjects in parallel. (Worker processes used: 4). Started at 11h44m21s: sub-ER240. See log file /Users/julien/temp/Lydia/results_20240411_114414/log/batch_processing_sub-ER240.log Started at 11h44m21s: sub-GB300. See log file /Users/julien/temp/Lydia/results_20240411_114414/log/batch_processing_sub-GB300.log Started at 11h44m21s: sub-GE200. See log file /Users/julien/temp/Lydia/results_20240411_114414/log/batch_processing_sub-GE200.log Started at 11h44m21s: sub-LD214. See log file /Users/julien/temp/Lydia/results_20240411_114414/log/batch_processing_sub-LD214.log

Hooray! your batch completed successfully :-)

Started: 2024-04-11 11h44m21s | Ended: 12h57m38s | Duration: 01h13m16s

— Reply to this email directly, view it on GitHubhttps://github.com/sct-pipeline/spine-park/issues/15#issuecomment-2050190134, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BFCFJYV43FH6NOYRT2GKU3LY43DODAVCNFSM6AAAAABEWB5SPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJQGE4TAMJTGQ. You are receiving this because you were mentioned.Message ID: @.***>

sct-pipeline / spine-park

First pass and correct T2w segmentations #15