pydicom / deid

best effort anonymization for medical images using python
https://pydicom.github.io/deid/
MIT License
140 stars 43 forks source link

Refactor Unit Tests to utilize separate data store for sample images #167

Closed wetzelj closed 1 year ago

wetzelj commented 3 years ago

Created this issue to continue discussion started on PR #166:

With this proposed new issue, I believe that I have an opportunity to build out our test image data set with a larger variety of PHI-safe test images from a variety of makes/models of scanners. Our department has an annual employee art show in which we can image (XR, CT, US) objects to serve as the basis for works of art.

I really like this idea. And actually, we can fairly easy separate the small example data (e.g., dicom cookies) from those that are just used for testing, which arguably shouldn't even be stored in the repository because they are so large. I think what would make sense to do is follow the convention of pydicom-data, and even first ask if there is room to accept files for deid here to that library (so pydicom can have a single, shared data library). @darcymason would this be something we could consider? If we want to keep the libraries separate, and you'd want to make a new dataset not just for deid but for other dicom enthusiasts then we could also have it branded differently and then used by the library here. Either way, we could definitely have a function to decompress after install.

Having looked closer at pydicom-data, I wonder if it would be better for us to first attempt to convert the unit tests to use images that are already stored within pydicom-data? While I haven't done an exhaustive look to determine if all of our test cases could be covered by images in pydicom-data, it does appear that there's high potential that they could be. This could eliminate the need for a standalone deid test image repo.

vsoch commented 3 years ago

@wetzelj I think this is a great idea - and we could still keep (the smaller) files alongside deid in case anyone wants a quick dataset to play with. If there is something missing, I'm hoping that @darcymason would be open to a PR for us to add it. That way, both repos can share the same testing data.

wetzelj commented 3 years ago

What are your thoughts on potentially upgrading pydicom beyond the current required version 1.3.0 with this?

For about two weeks now, I've been running the latest deid code with pydicom 2.1.1 and have seen no issues with this upgrade. Between 1.3.0 and 2.1.1 there have been some changes to pydicom.data which could impact how this issue is implemented.

vsoch commented 3 years ago

Yep I think that would be totally okay - the reason for pinning was something related to the cleaner a while back, and likely it's not an issue anymore. I bumped to allow for >= 1.3.0 for the feedstock and haven't heard any issues (yet). https://github.com/conda-forge/deid-feedstock/blob/master/recipe/meta.yaml#L26. Definitely update the version to be >= whatever minimal one that you suspect needing, being conservative if possible if some folks might still be using a 1.x version.

darcymason commented 3 years ago

so pydicom can have a single, shared data library). @darcymason would this be something we could consider?

Absolutely, no problem. Wouldn't want huge files (or a large number of smaller files) because some people will still download the whole set, but if just adding some reasonably sized ones, I would welcome that.

vsoch commented 3 years ago

Awesome, thank you @darcymason ! @wetzelj you're good to go!

vsoch commented 1 year ago

@wetzelj do you still want to work on this? I actually think it would be fine to creat a deid-data package too. If not I’m happy to take lead.

wetzelj commented 1 year ago

Good Morning @vsoch & @RobinFrcd - I won't object at all if you two want to take this on. Unfortunately I haven't been able to circle back to this and I don't see it on the horizon anytime soon. :(

I've been keeping an eye on this project - LOTS of good stuff going on! We're still using 0.2.28... The changes introduced since that version will be very handy when we have the opportunity to upgrade!

vsoch commented 1 year ago

okay sounds good! I think I'm going to try just making a simple pip install to place the data in the expected spot. Stay tuned!

vsoch commented 1 year ago

Looks like @RobinFrcd beat me to a PR - with an interesting idea too - moving the data to be alongside tests (and not installing to the package). @wetzelj what do you think of the two ideas?

wetzelj commented 1 year ago

The separate pip install solution seems slightly more intuitive to me. I'd prefer that approach - but honestly either would be good!

vsoch commented 1 year ago

Gotcha. Thank you @wetzelj!

vsoch commented 1 year ago

@wetzelj take a look at my idea here: https://github.com/pydicom/deid/pull/227 and @RobinFrcd #226

For the first, that was also my first gut instinct - it's a bit simpler in the sense that deid-data is just an external package that anyone can install, and deid uses it for tests (and someone could arguably use it for their own use case). I think it might be preferable to have this format, primarily so someone can easily install the data on a whim without cloning, and we can cleanly maintain the two projects (https://github.com/pydicom/deid-data). Let me know your thoughts!

wetzelj commented 1 year ago

I think this will be fine! Thanks for your work! :)

vsoch commented 1 year ago

This is done! https://pypi.org/project/deid/0.2.36/ woot!