spine-generic / data-multi-subject

Multi-subject data for the Spine Generic project
Creative Commons Attribution 4.0 International
22 stars 15 forks source link

Deal with subjects scanned on different scanners #102

Open kousu opened 2 years ago

kousu commented 2 years ago

There are 15 sub-tokyo subjects; however, there's actually only 5 real subjects involved, each scanned on three different MRI scanners. For example:

u108545@joplin:~/data-multi-subject$ ls sub-tokyo*01
sub-tokyo750w01:
anat  dwi

sub-tokyoIngenia01:
anat  dwi

sub-tokyoSkyra01:
anat  dwi
u108545@joplin:~/data-multi-subject$ egrep 'sub-tokyo.*?01[[:space:]]*(F|M)' participants.tsv 
sub-tokyo750w01 M   25  -   -   2019-10-01  tokyo750w   the University of Tokyo GE  MR750w  -   24_LX_MR_Software_release:DV24.0_R01_1344.a "K. Kamiya, Y. Suzuki"
sub-tokyoIngenia01  M   25  -   -   2019-10-01  tokyo   Ingenia the University of Tokyo Philips Ingenia -   5.3.1_5.3.1.1"K. Kamiya, Y. Suzuki"
sub-tokyoSkyra01    M   25  -   -   2019-10-01  tokyoSkyra  the University of Tokyo Siemens Skyra   HeadNeck_20 syngo_MR_E11    "K. Kamiya, Y. Suzuki"

It is safer, and probably more BIDS-compliant, if we represented the "different scanner" field using an acq- entity (or possibly ses-), and put these scans all under a single folder (sub-tokyo01). Then we only need to record their tabular data once in participants.tsv and repairs like #96 won't be so fraught to perform.

Discovered in https://github.com/spine-generic/data-multi-subject/pull/96#issuecomment-930497296

jcohenadad commented 2 years ago

This is a valid concern. However, merging these participants would break the analysis code, so there is a pros/cons here.

kousu commented 2 years ago

I'll fix the analysis code.

kousu commented 2 years ago

Turns out, the hardware field already has a place to go in BIDS: it goes in the .json, not in the filename, and we have this data in the right place already:

``` u108545@joplin:~/data-multi-subject$ grep ManufacturersModelName sub-tokyo*01/anat/*.json sub-tokyo750w01/anat/sub-tokyo750w01_acq-MToff_MTS.json: "ManufacturersModelName": "DISCOVERY_MR750w", sub-tokyo750w01/anat/sub-tokyo750w01_acq-MTon_MTS.json: "ManufacturersModelName": "DISCOVERY_MR750w", sub-tokyo750w01/anat/sub-tokyo750w01_acq-T1w_MTS.json: "ManufacturersModelName": "DISCOVERY_MR750w", sub-tokyo750w01/anat/sub-tokyo750w01_T1w.json: "ManufacturersModelName": "DISCOVERY_MR750w", sub-tokyo750w01/anat/sub-tokyo750w01_T2star.json: "ManufacturersModelName": "DISCOVERY_MR750w", sub-tokyo750w01/anat/sub-tokyo750w01_T2w.json: "ManufacturersModelName": "DISCOVERY_MR750w", sub-tokyoIngenia01/anat/sub-tokyoIngenia01_acq-MToff_MTS.json: "ManufacturersModelName": "Ingenia_CX", sub-tokyoIngenia01/anat/sub-tokyoIngenia01_acq-MTon_MTS.json: "ManufacturersModelName": "Ingenia_CX", sub-tokyoIngenia01/anat/sub-tokyoIngenia01_acq-T1w_MTS.json: "ManufacturersModelName": "Ingenia_CX", sub-tokyoIngenia01/anat/sub-tokyoIngenia01_T1w.json: "ManufacturersModelName": "Ingenia_CX", sub-tokyoIngenia01/anat/sub-tokyoIngenia01_T2star.json: "ManufacturersModelName": "Ingenia_CX", sub-tokyoIngenia01/anat/sub-tokyoIngenia01_T2w.json: "ManufacturersModelName": "Ingenia_CX", sub-tokyoSkyra01/anat/sub-tokyoSkyra01_acq-MToff_MTS.json: "ManufacturersModelName": "Skyra", sub-tokyoSkyra01/anat/sub-tokyoSkyra01_acq-MTon_MTS.json: "ManufacturersModelName": "Skyra", sub-tokyoSkyra01/anat/sub-tokyoSkyra01_acq-T1w_MTS.json: "ManufacturersModelName": "Skyra", sub-tokyoSkyra01/anat/sub-tokyoSkyra01_T1w.json: "ManufacturersModelName": "Skyra", sub-tokyoSkyra01/anat/sub-tokyoSkyra01_T2star.json: "ManufacturersModelName": "Skyra", sub-tokyoSkyra01/anat/sub-tokyoSkyra01_T2w.json: "ManufacturersModelName": "Skyra", ```

And BIDS recommends encoding multiple visits/scans by nesting them a level deeper under ses-<label>/.

I propose

  1. either not encoding the scanner in the filename at all but adding a session field, or encoding it in the 'session' field: sub-tokyo{scanner}{id} -> sub-tokyo{id}_ses-{scanner}

    So, either:

    u108545@joplin:~/data-multi-subject$ mkdir -p sub-tokyo05 && git mv sub-tokyoIngenia05/ sub-tokyo05/ses-01

    or

    u108545@joplin:~/data-multi-subject$ mkdir -p sub-tokyo05 && git mv sub-tokyoIngenia05/ sub-tokyo05/ses-Ingenia

    but repeated for each every subject. For most subjects with only one session, BIDS still wants us to nest a ses-01/ folder:

    The extra session layer (at least one /ses-

  2. Merging the tokyo subjects:

    either

    git mv sub-tokyoSkyra{id} sub-tokyo{id}/ses-02
    git mv sub-tokyo750w{id} sub-tokyo{id}/ses-03

    or

    git mv sub-tokyoSkyra{id} sub-tokyo{id}/ses-Skyra
    git mv sub-tokyo750w{id} sub-tokyo{id}/ses-750w
  3. Move the date, manufacturer, manufacturers_model_name from participants.tsv to per-subject sub-tokyo{id}/sub-tokyo{id}_sessions.tsv files

  4. Changing the analysis code to parse out the information when it needs it from either the .jsons, or the _session.tsv files, not the filenames.

jcohenadad commented 2 years ago

thank you @kousu, this seems like a very reasonable plan. In terms of index vs. scanner name in the filename, i do have a slight preference for encoding in the file name, just because it is more human friendly

kousu commented 2 years ago

thank you @kousu, this seems like a very reasonable plan. In terms of index vs. scanner name in the filename, i do have a slight preference for encoding in the file name, just because it is more human friendly

Great. I can do that!

jcohenadad commented 3 months ago

Reviving this thread, given a recent comment https://github.com/spine-generic/data-multi-subject/issues/166 and the demographic-based project from @renelabounek. We should find a reasonable strategy to deal with the same subjects being scanned at multiple sites. The solutions proposed in https://github.com/spine-generic/data-multi-subject/issues/102#issuecomment-969444920 is problematic, in that the logic of the analysis code and results should be drastically different. I'm wondering if simply adding a column in the https://github.com/spine-generic/data-multi-subject/blob/113b258695074b77d40ba987474eddc14f9d9698/participants.tsv with an arbitrary ID for each subject could properly address this? Then, for projects where the demographics of the subject is relevant (eg: @renelabounek project), the specific analysis code could use that information (by, eg., selecting non-duplicate subjects based on their IDs as opposed to based on the participant_id).