sjoerdk / idiscore

Deidentification of DICOM images using Attribute Confidentiality Profiles
GNU General Public License v3.0
11 stars 2 forks source link

Basic examples in docs fail due to missing metadata #106

Closed Bonnevie closed 2 years ago

Bonnevie commented 2 years ago

This library seems to still be undergoing maintenance, so a quick observation.

Description

Running Core.deidentify seems to not produce a copy as otherwise described in the docs, and in fact produces a DICOM object that cannot be saved as it has neither preamble nor filemeta. If metadata is restored with pydicom's fix_file_meta, it can be saved, but not restored.

I'd expect, based on the library's self-stated objectives, that operations were available to apply a confidentiality profile + custom rules and not otherwise transform the DICOM object.

What I Did

I have verified that the basic deidentification example of the library at https://github.com/sjoerdk/idiscore/blob/master/examples/deidentify_a_dataset_basic.py does not work:

import pydicom

from idiscore.core import Core, Profile
from idiscore.defaults import get_dicom_rule_sets

sets = get_dicom_rule_sets()  # Contains official DICOM deidentification rules
profile = Profile(  # Choose which rule sets to use
    rule_sets=[sets.basic_profile, sets.retain_modified_dates, sets.retain_device_id]
)
core = Core(profile)  # Create an deidentification core

# read a DICOM dataset from file and write to another
core.deidentify(pydicom.dcmread(my_own_file)).save_as("deidentified.dcm")

With traceback:

AttributeError                            Traceback (most recent call last)
<ipython-input-4-1f9729cc0bc5> in <cell line: 13>()
     11 
     12 # read a DICOM dataset from file and write to another
---> 13 core.deidentify(pydicom.dcmread(mydicom)).save_as("deidentified.dcm")

~/.cache/pypoetry/virtualenvs/xray-data-registry-U7OvTqMo-py3.8/lib/python3.8/site-packages/pydicom/dataset.py in save_as(self, filename, write_like_original)
   2059             Write a DICOM file from a :class:`FileDataset` instance.
   2060         """
-> 2061         pydicom.dcmwrite(filename, self, write_like_original)
   2062 
   2063     def ensure_file_meta(self) -> None:

~/.cache/pypoetry/virtualenvs/xray-data-registry-U7OvTqMo-py3.8/lib/python3.8/site-packages/pydicom/filewriter.py in dcmwrite(filename, dataset, write_like_original)
   1036     if None in encoding:
   1037         if tsyntax is None:
-> 1038             raise AttributeError(
   1039                 f"'{cls_name}.is_little_endian' and "
   1040                 f"'{cls_name}.is_implicit_VR' must be set appropriately "

AttributeError: 'Dataset.is_little_endian' and 'Dataset.is_implicit_VR' must be set appropriately before saving

Likely fix

In the apply_rules method, the output object is initialized with a plain default deidentified = Dataset(). This likely fails to copy the metadata over, and should be replaced with an actual deep copy (preferably optionally including pixel data!).

Bonnevie commented 2 years ago

The problem is also largely fixed post hoc by:

def deidentify(dcm: FileDataset) -> FileDataset:
    deid_dcm = core.deidentify(dcm)
    deid_dcm.PixelData = dcm.PixelData
    deid_dcm.file_meta = dcm.file_meta
    deid_dcm.preamble = dcm.preamble
    return deid_dcm
sjoerdk commented 2 years ago

Thanks for the clear feedback!

Regarding documentation mentioning deepcopy You are right that the documentation still mentions a deepcopy while this is actually a modification in place. This is sloppy maintenance on my part.

The change from deepcopy to modification in place is a very late commit (6be8769) This change was made for pragmatic reasons. Memory usage was sometimes hard to control even with separate handling for pixeldata.

The docs should be updated. I made #108 for this. Thanks for noticing

Regarding the example not working I cannot reproduce this. Are you sure the file you are loading has a correct header itself? Does the following work?

pydicom.dcmread(my_own_file).save_as("deidentified.dcm")

I cannot find the output object being initialized with an empty Dataset() call. Here it is just passed through: https://github.com/sjoerdk/idiscore/blob/master/idiscore/core.py#L170

I'm sticking to pydicom conventions for reading and writing, so if a DICOM file has no preamble or header to begin with, I leave it up to the user to apply post or ad hoc fixes.

Bonnevie commented 2 years ago

Hi @sjoerdk, sorry for not getting back to you more swiftly.

I think the problem with your reproduction is that you are on the master branch whereas I'm on the latest versioned branch with tag v1.0.1, which is what is installed with pip.

See here for deidentified = Dataset(): https://github.com/sjoerdk/idiscore/blob/decf7763d70d0a57aa81224ecd1928d60c8def0f/idiscore/core.py#L185

But great news then, if you cannot reproduce - that must mean it's fixed on master? Maybe you could do a minor release?

The DICOM file has a preamble and file_meta, so the problem was that IDISCORE was deleting them when doing its rewrites.

sjoerdk commented 2 years ago

Good point! Thanks again for the checks. I think I missed a commit tag in v1.0.2, which made CI skip the pypi deploy in this run.

I'll make a new deploy. I'll update package management to pep 517 and start using poetry instead of setup tools. New pypi release will be somewhere next week.

Bonnevie commented 2 years ago

That sounds great! Thanks for the fast response and the good work.

sjoerdk commented 2 years ago

Version 1.1.0 is now pushed to pypi. This fixes the issues when running the example files. Updating the docs regarding deepcopy will be picked up in #108.

To prevent issues with the example files in the future they should be included in the standard tests. Created #112 for this.