Closed Bonnevie closed 2 years ago
The problem is also largely fixed post hoc by:
def deidentify(dcm: FileDataset) -> FileDataset:
deid_dcm = core.deidentify(dcm)
deid_dcm.PixelData = dcm.PixelData
deid_dcm.file_meta = dcm.file_meta
deid_dcm.preamble = dcm.preamble
return deid_dcm
Thanks for the clear feedback!
Regarding documentation mentioning deepcopy You are right that the documentation still mentions a deepcopy while this is actually a modification in place. This is sloppy maintenance on my part.
The change from deepcopy to modification in place is a very late commit (6be8769) This change was made for pragmatic reasons. Memory usage was sometimes hard to control even with separate handling for pixeldata.
The docs should be updated. I made #108 for this. Thanks for noticing
Regarding the example not working I cannot reproduce this. Are you sure the file you are loading has a correct header itself? Does the following work?
pydicom.dcmread(my_own_file).save_as("deidentified.dcm")
I cannot find the output object being initialized with an empty Dataset()
call. Here it is just passed through: https://github.com/sjoerdk/idiscore/blob/master/idiscore/core.py#L170
I'm sticking to pydicom conventions for reading and writing, so if a DICOM file has no preamble or header to begin with, I leave it up to the user to apply post or ad hoc fixes.
Hi @sjoerdk, sorry for not getting back to you more swiftly.
I think the problem with your reproduction is that you are on the master branch whereas I'm on the latest versioned branch with tag v1.0.1, which is what is installed with pip
.
See here for deidentified = Dataset()
:
https://github.com/sjoerdk/idiscore/blob/decf7763d70d0a57aa81224ecd1928d60c8def0f/idiscore/core.py#L185
But great news then, if you cannot reproduce - that must mean it's fixed on master? Maybe you could do a minor release?
The DICOM file has a preamble and file_meta, so the problem was that IDISCORE was deleting them when doing its rewrites.
Good point! Thanks again for the checks. I think I missed a commit tag in v1.0.2, which made CI skip the pypi deploy in this run.
I'll make a new deploy. I'll update package management to pep 517 and start using poetry instead of setup tools. New pypi release will be somewhere next week.
That sounds great! Thanks for the fast response and the good work.
Version 1.1.0 is now pushed to pypi. This fixes the issues when running the example files. Updating the docs regarding deepcopy will be picked up in #108.
To prevent issues with the example files in the future they should be included in the standard tests. Created #112 for this.
This library seems to still be undergoing maintenance, so a quick observation.
Description
Running
Core.deidentify
seems to not produce a copy as otherwise described in the docs, and in fact produces a DICOM object that cannot be saved as it has neither preamble nor filemeta. If metadata is restored withpydicom
'sfix_file_meta
, it can be saved, but not restored.I'd expect, based on the library's self-stated objectives, that operations were available to apply a confidentiality profile + custom rules and not otherwise transform the DICOM object.
What I Did
I have verified that the basic deidentification example of the library at https://github.com/sjoerdk/idiscore/blob/master/examples/deidentify_a_dataset_basic.py does not work:
With traceback:
Likely fix
In the
apply_rules
method, the output object is initialized with a plain defaultdeidentified = Dataset()
. This likely fails to copy the metadata over, and should be replaced with an actual deep copy (preferably optionally including pixel data!).