Closed vsoch closed 2 years ago
Hi @vsoch,
Many thanks for working on this very quickly!
I like the idea of having a section within the deid.dicom recipe file. Just like we have %Header section (https://pydicom.github.io/deid/examples/recipe/), maybe we could have something like %File_meta section for the file meta. My understanding is that the attributes (data elements) within the dicom file meta are different to those of the dicom file. Here is a link to the official page of DICOM files where all the attributes within the file meta are listed (of course not all dicom files meta have all these attributes, but it could be). NB: All the dicom file meta have a tag starting with 0002.
My intuition would be that we wouldn't require any changes to be done to the file meta unless the user has that as part of the deid.dicom recipe. I am not sure TBH about that...
Many thanks again!
Regards
Adil
%File_meta section for the file meta.
Yes that's correct! The PR here adds %filemeta
(take a look at the example https://github.com/pydicom/deid/blob/39d8e019c1d3c85047a2c5a20b514d141d6e8535/examples/dicom/header-manipulation/file-meta/deid.dicom)
My understanding is that the attributes (data elements) within the dicom file meta are different to those of the dicom file. Here is a link to the official page of DICOM files where all the attributes within the file meta are listed (of course not all dicom files meta have all these attributes, but it could be). NB: All the dicom file meta have a tag starting with 0002.
If this is the case, then my strategy was right on! Do you want to test the branch here?
The last remaining bit is if we need to do something special for save. Let's wait for @wetzelj feedback then ping some of the other maintainers to ask.
I also like the idea of having a separate %file_meta section. We would, however, probably want to ensure that standard %header actions don't overlap and act against filemeta attributes (thinking of expanders and %values based rules).
I don't think we would want a recipe set up as below to remove the MediaStorageSOPInstanceUID that it just replaced:
%filemeta
REPLACE MediaStorageSOPInstanceUID new-id
%header
REMOVE endswith:UID
Instead, if we did want to remove fields ending with UID from both sections of the file, we'd want to build a recipe like:
%filemeta
REMOVE endswith:UID
%header
REMOVE endswith:UID
Yes that's correct! The PR here adds
%filemeta
(take a look at the example https://github.com/pydicom/deid/blob/39d8e019c1d3c85047a2c5a20b514d141d6e8535/examples/dicom/header-manipulation/file-meta/deid.dicom)
Oh yes that's perfect!
%filemeta REPLACE MediaStorageSOPInstanceUID new-id
%header REMOVE endswith:UID
Instead, if we did want to remove fields ending with UID from both sections of the file, we'd want to build a recipe like:
%filemeta REMOVE endswith:UID
%header REMOVE endswith:UID
I am wondering why do we need to have REMOVE endswith:UID
as part of the recipe?
That was just a sample to indicate the way we would/would not want the actions to be applied - that wasn't anything we'd really want to do. :)
That’s a good point - but @wetzelj to go in the other direction if fields are unique why even have the second section? A rule to replace all ending with UID should hit those in file meta, why should they be special? But I do think we perhaps want to protect some of the fields like the size and transfer syntax from change?
That was just a sample to indicate the way we would/would not want the actions to be applied - that wasn't anything we'd really want to do. :)
Oh I see, that makes sense
That’s a good point - but @wetzelj to go in the other direction if fields are unique why even have the second section? A rule to replace all ending with UID should hit those in file meta, why should they be special? But I do think we perhaps want to protect some of the fields like the size and transfer syntax from change?
My initial intuitive understanding was that because the data elements part of the file_meta are inherently separate for the dicom file data elements, it would probably make sense to have a seperate section for them within the recipe (to access the dicom data elements it is enough to do ds.DATAELEMENT
whereas to access file_meta data elements, we would need to do ds.file_meta.DATAELEMENT
) I don't have a strong opinion on this to be honest.
I guess you're right. It probably doesn't make much sense to split it out. My statement was coming from my line of thinking that filemeta fields were special, and should be somewhat protected from "bad"/overly wide recipes... but really, that's not the case.
Honestly, I'm with @adildahlan and don't have a strong opinion on this either.
(to access the dicom data elements it is enough to do ds.DATAELEMENT whereas to access file_meta data elements, we would need to do ds.file_meta.DATAELEMENT) I don't have a strong opinion on this to be honest.
We can actually handle this easily because the dicom.file_meta is just another kind of dataset, so we add it as a dataset to parse (and save fields for). Since the fields we add to our lookup data structure are the same ones that point back into the dicom and they are accessed easily, we can actually just parse them akin to any other header.
Okay so here is my thoughts for the recipe:
REPLACE endswith:uid
and in this case I think we either have to only parse one of the header or file_meta (and add a lot of custom logic that is going to slow things down) OR allow the parsing to be seamless, and to treat the file meta fields just like the header fields. I actually think it's more risky that people miss file meta fields with unique ids that maybe should be replaced because they have to specify it twice. So I want to remove the filemeta section.FileMetaInformationGroupLength
FileMetaInformationVersion
TransferSyntaxUID
ImplementationClassUID
So those are my thoughts for moving forward - remove the special section and treat the fields like any other, and then protect ones that shouldn't be changed. Let me know what you think!
So those are my thoughts for moving forward - remove the special section and treat the fields like any other, and then protect ones that shouldn't be changed. Let me know what you think!
Sounds perfect!
Agreed. This solution will be great!
okay fantastic! I'll get in a PR after the end of the work day. Thanks for the quick, fun discussion y'all!
Okay @wetzelj @adildahlan this should be ready for review! Let me know if there are tests you think we should add, or if your use case is not addressed.
Hi @vsoch !
Many thanks for working on this.
So I created a new conda environment and installed the current branch of the died library using the following code:
pip install git+https://github.com/pydicom/deid.git@add/support-file-meta
And I have my recipe as follows:
%header
REPLACE PatientName var:new_PatientName
REPLACE PatientID var:new_PatientID
ADD SOPInstanceUID var:new_SOPInstanceUID
ADD MediaStorageSOPInstanceUID var:new_MediaStorageSOPInstanceUID
And in my code I define each of these new variables as done in. this example code https://github.com/pydicom/deid/blob/master/examples/dicom/recipe/deid-dicom-example.py
However, I am getting this error
I hope you could help me figure out what's wrong
Many thanks!
Adil
Oh interesting, looks like an issue with save. Can you share with me a dummy example to reproduce the issue? Are you just using the provided cat.dcm but updated the deid recipe?
Oh interesting, looks like an issue with save. Can you share with me a dummy example to reproduce the issue? Are you just using the provided cat.dcm but updated the deid recipe?
Hi @vsoch !
Oh I see. I am using the data I downloaded from the following link.
I would need to check if I am allowed to share the code as it is a private project I am collaborating on. Can you please try running your script with the dataset in the link above? If you don't get the same error maybe then I can check with my supervisor if I can share bits of the code with you. Apologies about that...
Many thanks!
it's okay I was able to reproduce on save:
In [3]: dicom.save_as('test.dcm')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/Desktop/Code/deid/examples/dicom/header-manipulation/filemeta/example.py in <module>
----> 1 dicom.save_as('test.dcm')
~/Desktop/Code/deid/env/lib/python3.8/site-packages/pydicom/dataset.py in save_as(self, filename, write_like_original)
1885 Write a DICOM file from a :class:`FileDataset` instance.
1886 """
-> 1887 pydicom.dcmwrite(filename, self, write_like_original)
1888
1889 def ensure_file_meta(self) -> None:
~/Desktop/Code/deid/env/lib/python3.8/site-packages/pydicom/filewriter.py in dcmwrite(filename, dataset, write_like_original)
969 # place
970 if dataset.group_dataset(0x0002) != Dataset():
--> 971 raise ValueError(
972 f"File Meta Information Group Elements (0002,eeee) should be in "
973 f"their own Dataset object in the "
ValueError: File Meta Information Group Elements (0002,eeee) should be in their own Dataset object in the 'FileDataset.file_meta' attribute.
Let me take a look, I might need to ask for help from the other maintainers of pydicom on this one!
it's okay I was able to reproduce on save:
In [3]: dicom.save_as('test.dcm') --------------------------------------------------------------------------- ValueError Traceback (most recent call last) ~/Desktop/Code/deid/examples/dicom/header-manipulation/filemeta/example.py in <module> ----> 1 dicom.save_as('test.dcm') ~/Desktop/Code/deid/env/lib/python3.8/site-packages/pydicom/dataset.py in save_as(self, filename, write_like_original) 1885 Write a DICOM file from a :class:`FileDataset` instance. 1886 """ -> 1887 pydicom.dcmwrite(filename, self, write_like_original) 1888 1889 def ensure_file_meta(self) -> None: ~/Desktop/Code/deid/env/lib/python3.8/site-packages/pydicom/filewriter.py in dcmwrite(filename, dataset, write_like_original) 969 # place 970 if dataset.group_dataset(0x0002) != Dataset(): --> 971 raise ValueError( 972 f"File Meta Information Group Elements (0002,eeee) should be in " 973 f"their own Dataset object in the " ValueError: File Meta Information Group Elements (0002,eeee) should be in their own Dataset object in the 'FileDataset.file_meta' attribute.
Let me take a look, I might need to ask for help from the other maintainers of pydicom on this one!
Oh perfect! Many thanks!
Temperarly, I found a way to go around the issue with anonymising the MediaStorageSOPInstanceUID. I did so by running the anonymisation script without having in the recipe the MediaStorageSOPInstanceUID in it. Then afterwards I loop through all the anonymised dicom files and if the MediaStorageSOPInstanceUID dataelement doesn't match the SOPInstanceUID then I change the former to the later and save the dicom file as shown below.
if 'var:new_SOPInstanceUID' in new_variables:
# Read in the new anonymised dicom files
dicom_files_path = list(get_files(output_directory, pattern='*.dcm'))
# Dictionary of dicom directories and header dictionary
dicoms_dict = get_identifiers(dicom_files_path)
# Loop through the new anonymised dicom files
for i, (dicom_path, dicom_header) in enumerate(dicoms_dict.items()):
# Read the anonymised dicom file
ds = pydicom.dcmread(dicom_path)
# Check if the MediaStorageSOPInstanceUID wasn't anonymised
if ds.file_meta.MediaStorageSOPInstanceUID != ds.SOPInstanceUID:
# Anonymise the MediaStorageSOPInstanceUID
ds.file_meta.MediaStorageSOPInstanceUID = ds.SOPInstanceUID
# Add the new MediaStorageSOPInstanceUID to the MediaStorageSOPInstanceUID_old_new_pair
MediaStorageSOPInstanceUID_old_new_pair[ds.file_meta.MediaStorageSOPInstanceUID] = ds.SOPInstanceUID
# Save the anonymised dicom file
ds.save_as(dicom_path)
This worked so I guess saving the dicoms file after changing the MediaStorageSOPInstanceUID works but probably not with having MediaStorageSOPInstanceUID in the recipe.
Many thanks for all your help!
hey @adildahlan so I figured it out! What was happening is that for any action that uses the add
functionality, the assumption was adding to the dicom directly as a field. Of course with file meta this is problematic because we need to save to the filemeta instead. So the fix was to restore preserving the boolean if something was file meta, and then taking a different action in the parser depending on if it was file meta or not. Take a look at the changes here: https://github.com/pydicom/deid/pull/184/commits/a7f3f2012a16bf193e85a0e8207fe63f5ce03fc0 and when tests pass give the updated branch a try.
hey @adildahlan so I figured it out! What was happening is that for any action that uses the
add
functionality, the assumption was adding to the dicom directly as a field. Of course with file meta this is problematic because we need to save to the filemeta instead. So the fix was to restore preserving the boolean if something was file meta, and then taking a different action in the parser depending on if it was file meta or not. Take a look at the changes here: a7f3f20 and when tests pass give the updated branch a try.
Hi @vsoch, it is working!!! Many thanks for having worked on this issue very quickly. You have done a great job at building this library and maintaining it 👏🏼! Wish you all the best! Warm regards Adil
Woot! Let's merge and release then.
This is the start of work to read/parse file_meta to address #183. I made some time this evening and I think the implementation was easier than I anticipated, but I have a few questions / concerns to address!
Remaining questions:
Do file meta fields overlap with fields in the dicom?
The simplest implementation I could do is to look for another section in the recipe,
filemeta
, that gets parsed just like the header section. I simply createDicomField
with is_filemeta = True. However, I realized that if the identifiers you find in file_meta are unique to that (and vice versa with the dicom) then we don't need to check this value - we can simply replace/add/ etc based on the unique identifier just as we would. So this is how I implemented it to start. However, if it's the case that the same Field can appear in file meta OR the regular dicom, then we'll need to add flag actions that are just for filemeta, and add checks all over the place that essentially say:TLDR: If fields are not unique to the sections we need to be checking each field if it is a filemeta or not before any action is taken. As implemented now, the fields are parsing over equivalently and it is assumed that file meta fields do not appear in the dicom and vice versa.
Do we need to make any special changes to the file meta, e.g., perhaps the size?
I have never manipulated it before, and I am hoping that pydicom save handles these changes, but if not we should do it manually. We can check with some of the other maintainers of the pydicom org that know pydicom well if we aren't sure here, and after we've addressed the point above.
cc @wetzelj and @adildahlan for your feedback!
Signed-off-by: vsoch vsoch@users.noreply.github.com
Description
Related issues: # (issue)
Please include a summary of the change(s) and if relevant, any related issues above.
Checklist
Open questions
See above!