lassoan / slicerio

Utilities for reading and writing files created by 3D Slicer
MIT License
34 stars 3 forks source link

slicerio.extract_segments : array correction error #10

Closed laurentletg closed 2 months ago

laurentletg commented 2 months ago

Issue with slicerio.extract_segments voxel output

The slicerio package was really useful in my quality control pipeline. Annotators using Slicer sometimes deleted a segment and then added a new one with the same name instead of erasing mistakes. This resulted in the wrong value being assigned to the new segment despite having the same segment name. This is problematic for downstream applications such as training or testing a deep learning model.

For demonstration, I use a toy segmentation (.seg.nrrd file) containing the segment name 'three' with a label value of 4 (mimicking an annotation mistake that needs to be corrected according to the value below). I want to make sure the label value in the header and array is changed from 4 to 3.

segment_names_to_labels = [("one", 1), ("two", 2), ("three", 3)]

Previous behavior slicerio==0.1.8

I was using this block from the previous slicerio version to correct the segment name-value mismatch:

segmentation_info = slicerio.read_segmentation_info(input_filename)
voxels, header = nrrd.read(input_filename)
print(f"Pre conversion labels {np.unique(voxels)}")
print(f"Pre conversion dtype {voxels.dtype}")
extracted_voxels, extracted_header = slicerio.extract_segments(voxels, header, segmentation_info, segment_names_to_labels)
print(f"Post conversion labels {np.unique(extracted_voxels)}")
print(f"Post conversion dtype {extracted_voxels.dtype}")

Output:

Pre conversion labels [0 1 2 4]
Pre conversion dtype uint8
Post conversion labels [0. 1. 2. 3.]
Post conversion dtype float64

After using slicerio.extract_segments, the wrong value for 'three' is corrected to 3 (expected behavior). Note also that the dtype is now float64 instead of uint8 (unexpected).

Current behavior slicerio==1.1.0

segmentation = slicerio.read_segmentation(input_filename)
voxels = segmentation['voxels']
print(f"Pre conversion labels {np.unique(voxels)}")
print(f"Pre conversion dtype {voxels.dtype}")
extracted_segmentation = slicerio.extract_segments(segmentation, segment_names_to_labels)
extracted_voxels = extracted_segmentation['voxels']
print(f"Post conversion labels {np.unique(extracted_voxels)}")
print(f"Post conversion dtype {extracted_voxels.dtype}")

Output:

Pre conversion labels [0 1 2 4]
Pre conversion dtype uint8
Post conversion labels [0]
Post conversion dtype uint8

Unless I made an error, the segmentation array is now filled with zeros. The dtype is now uint8 (expected).

Thanks!

lassoan commented 2 months ago

Thanks for reporting this.

Since segmentations cannot be reliably identified using string labels (see https://github.com/lassoan/slicerio/blob/main/UsingStandardTerminology.md for details), we recommend using standard terminology, and this is what is tested (https://github.com/lassoan/slicerio/blob/main/slicerio/tests/test_segmentation.py).

We could simply remove segment lookup by name (problem solved!), but maybe not everyone is ready for this and there may be some legacy projects that rely on segment names. So, instead of removing this feature, it may make sense to keep it and fix it.

I'll try to get to this in the next few weeks, but in the meantime if you can do any debugging of this and propose fixes it would help a lot. It is all just plain and simple Python scripting, everything should be trivial. Thanks in advance if you can spend some time with investigating this.

laurentletg commented 2 months ago

Thanks for the update.

I agree that the best way would be to use a standard terminology (SNOMED-CT), especially in the landscape of growing availability of datasets. Different naming conventions have the potential for errors. Unfortunately, I think most of us have not yet made the transition. When rapidly looking at other examples, I found other culprits :

  1. MONAILabel deep edit example relies on a dictionary to match the segment name vs label.
  2. Total Segmentator has not fully implemented this.

3D Slicer is a powerful annotation tool, but the potential for mismatch between an expected label name and value can cause downstream issues when used in other applications (e.g., deep learning models). The slicerio package is an important QC step in my workflow.

I am stuck for the next couple of days and won't have time to debug this, but I will try my best to work on this as soon as I can.

Right now, reverting to version 0.1.8 mitigates this issue if not using standard terminology. Perhaps having a flag indicating which type of segment lookup (names vs standard terminology) and using the previous extract_segment function could be an easy fix.

lassoan commented 2 months ago

I've fixed the issue and released slicerio-1.1.1

laurentletg commented 2 months ago

Excellent, thanks a lot! I should maybe work on a tool to convert old naming conventions to standard terminology (something like BIDS validator) to help the community transition to using standard terminology.

lassoan commented 2 months ago

It would be nice. We have a terminology selector in Slicer, but a web-based tool that could create/edit terminology json files and color table .csv files could be useful.