Set up pre-processing pipeline

Dataset

Notable info about the Karolinska dataset:

Composed of T2-sag, T2-ax and T2*-ax data per patient.
~54 patients
already under git-annex of data.neuro.polymtl.ca
We can assume each patient has the three data (ie: no missing contrast).
- EDIT: I've noticed that in some subjects, the T2-ax and T2*-ax do not overlap (or minimally), yielding effectively 2 contrasts only for MS lesion segmentation. So maybe we should consider 2 contrasts (not 3) per patient? --> TO DISCUSS

Creating ground truth

See discussion https://github.com/ivadomed/pipeline-ms-lesion/issues/2

Preprocessing

Discussion

Contrasts need to be co-registered. In order to always work in a fixed resolution, we could maybe consider straightening all contrasts, and interpolating them to a 0.5mm iso resolution. This target resolution is a reasonable tradeoff between required resolution (for lesion segmentation) and computation time. Also, the cord segmentation routine will resample the input image to 0.5mm anyway, so we don't introduce additional interpolation errors.

We could also consider registering the data to the PAM50 template, but this will require additional info, which might not be required: ie: vertebral labeling. Moreover, reg to template is a longer procedure than simply straightening the data

cons: a non-linear transformation will be required --> might alter the integrity of the data (in addition to being slower than not doing it)

Proposal

Resample to 0.5mm iso
Segment cords
- We should probably deactivate the resampling to 0.5mm (since they are already at 0.5mm)
Straightening of T2w-sag
- define a target resolution of 0.5mm iso
Straightening of T2*w-ax + T2w-ax
Registration of T2*w-ax + T2w-ax on the straightened T2w-sag
- using cord segmentation
- make sure to output QC
Concatenate all warping fields for each contrast
Apply warps to each contrast + manual labels (using trilinear interp)

Data management

Discussion

Where should we store the preprocessed data?

Under sct-testing-data/derivatives
- Pros: centralized, easy to version-track
- Cons: Takes up space on our centralized sct-testing-data (used for many other projects)
Under a separate git-annex folder
- Pros: does not take up space on sct-testing-data, version-tracked
- Cons: another git-annex dataset to manage, not easily syncable with sct-testing-data
Not version-tracked, but could be created on-the-fly with a version-tracked preprocessing script located on this repo
- Pros: does not take up space on sct-testing-data
- Cons: need to generate it everytime we need to use it, not version tracked (although the original dataset and the script to generate it are)

Proposal

Go with separate new version-tracked bundled preprocessed dataset

include QC report

Training

SoftSeg 😊
Missing modality approach (HEMIS)
2D vs. 3D? --> strong incentive for 3D!

ivadomed / pipeline-ms-lesion