Allow to specify custom metadata to be included in JSON sidecars

valosekj commented 9 months ago

This PR adds a new -json-metadata flag to allow you to specify a custom JSON file containing metadata to be added to the JSON sidecar of all corrected labels.

This flag is useful, for example, when a label was obtained automatically, and you want to include this information into the JSON sidecar.

Example usage:

python manual_correction.py -path-img ~/data/site-001_2024-02-21/data_processed/ -path-label ~/data/site-001_2024-02-21/data_processed/ -path-out ~/data/site-001/derivatives/labels/ -config sc_seg_to_correct_t2w.yml -json-metadata custom_metadata.json

where custom_metadata.json:

{
    "Name": "sct_deepseg_sc",
    "Version": "SCT v6.2",
    "Date": "2024-02-21"
}

The output JSON sidecar then looks like this:

{
    "SpatialReference": "orig",
    "GeneratedBy": [
        {
            "Name": "sct_deepseg_sc",
            "Version": "SCT v6.2",
            "Date": "2024-02-21"
        },
        {
            "Name": "Manual",
            "Author": "Jan Valosek",
            "Date": "2024-02-21 10:00:28"
        }
    ]
}

Without -json-metadata custom_metadata.json, the JSON sidecar would look like this:

{
    "SpatialReference": "orig",
    "GeneratedBy": [
        {
            "Name": "Manual",
            "Author": "Jan Valosek",
            "Date": "2024-02-21 10:00:28"
        }
    ]
}

I added a lot of people as reviewers just to let you know that this feature has been added :-)

Relevant: https://github.com/spinalcordtoolbox/manual-correction/issues/75

NathanMolinier commented 9 months ago

Hey @valosekj ! Thanks for adding this new feature, it will help in some situations indeed.

However, even if it could be helpful, let's not forget that this scenario is not ideal. This repository should only be used for manual corrections, not to fix problems generated by other scripts/pipelines.

If i understand correctly, you generated images using sct_run_batch and figured at the end that JSON sidecars should also be present. But instead of adding them after running your pipeline, I believe it would make more sense to generate them while running your pipeline using the library JQ for example. We could also think about adding the JSON sidecar creation to sct_run_batch directly as a new feature.

valosekj commented 9 months ago

However, even if it could be helpful, let's not forget that this scenario is not ideal. This repository should only be used for manual corrections, not to fix problems generated by other scripts/pipelines.

If i understand correctly, you generated images using sct_run_batch and figured at the end that JSON sidecars should also be present. But instead of adding them after running your pipeline, I believe it would make more sense to generate them while running your pipeline using the library JQ for example. We could also think about adding the JSON sidecar creation to sct_run_batch directly as a new feature.

Completely agree! The proposed PR can be used until https://github.com/spinalcordtoolbox/spinalcordtoolbox/issues/3394 is addressed (which aims to make SCT and sct_run_batch BIDS compatible). Even then, there will always be some new models that won't be part of the SCT (and thus won't produce proper JSON sidecars).

naga-karthik commented 8 months ago

Thanks for the PR, @valosekj ! The output JSON sidecar is a bit confusing in the sense that, if there are multiple revisions of the labels (i.e. rater1, deepseg, rater2, rater3, etc.), how is the order of the GeneratedBy key determined? Maybe we could add a key order that specifies the chronological order of the updates to the labels?

Ideally, we can show in reverse chronological order, such as:

{
    "SpatialReference": "orig",
    "GeneratedBy": [
        {
            "Name": "Manual",
            "Author": "Naga Karthik",
            "Date": "2024-02-27"
            "Order": 2
        },
        {
            "Name": "Manual",
            "Author": "Jan Valosek",
            "Date": "2024-02-21 10:00:28"
            "Order": 1
        },
        {
            "Name": "sct_deepseg_sc",
            "Version": "SCT v6.2",
            "Date": "2024-02-21"
            "Order": "0"
        }
    ]
}

This tells us that the first there was deepseg_sc, then corrected by Jan, then corrected by Naga (i.e. the latest set of labels are obtained with Naga's corrections)

What do you think?

NathanMolinier commented 8 months ago

Personally, I'm not a fan of this new key order @naga-karthik. I think it is a duplication of the already existing key date (it could also generate unexpected discrepancies with the date). But basically, if I remember correctly, the metadata is always added to the list, so the last update should be the one at the end (bottom) of the GeneratedBy list.

But, to make sure that we are able to retrace the timeline, we should make sure to always add the time to the date field.

valosekj commented 8 months ago

The output JSON sidecar is a bit confusing in the sense that, if there are multiple revisions of the labels (i.e. rater1, deepseg, rater2, rater3, etc.), how is the order of the GeneratedBy key determined?

We use the .append method meaning that with each revision, we append to the end:

https://github.com/spinalcordtoolbox/manual-correction/blob/fe8a8c122697974e6e9e74540b32bbae069f9c22/manual_correction.py#L542

I believe that this assures that the order of revisions is chronological.

naga-karthik commented 8 months ago

@NathanMolinier thanks for clarifying! I somehow overlooked the date key. Yes, order is indeed a duplication of date. But even in date, some entries are in yyyy-mm-dd hh:mm:ss format and some are just in yyyy-mm-dd format. Maybe we should standardize this?

valosekj commented 8 months ago

But even in date, some entries are in yyyy-mm-dd hh:mm:ss format and some are just in yyyy-mm-dd format. Maybe we should standardize this?

Good idea! Let's go with more specific yyyy-mm-dd hh:mm:ss.

valosekj commented 8 months ago

I self-tested and self-reviewed the PR --> merging.

spinalcordtoolbox / manual-correction

Allow to specify custom metadata to be included in JSON sidecars #80