pydicom / deid

best effort anonymization for medical images using python
https://pydicom.github.io/deid/
MIT License
138 stars 43 forks source link

Move data and tests out of the package #226

Closed RobinFrcd closed 1 year ago

RobinFrcd commented 1 year ago

Description

Related issues: #167 #225

The idea here is to move the heavy data folder out of the package. Moving the tests folder outside the package also makes sense because it's not needed to run any function.

vsoch commented 1 year ago

oh hmm... this could work too. What are the benefits of this approach vs. having the data as a separate (installable) package? indeed this one is probably easier to maintain!

And you are so speedy! I'm barely woken up :)

vsoch commented 1 year ago

And I'll share my idea - in case you think it makes sense! I was going to make an installable package, and then if the tests are attempting to be run or any import of testdata is attempted and the path doesn't exist yet, we clearly show the user "You need to pip install deid-data" (or similar). Then all the testing logic and paths would stay the same, but the library would not ship by default with the data (but it could be installed, if desired).

RobinFrcd commented 1 year ago

And I'll share my idea - in case you think it makes sense! I was going to make an installable package, and then if the tests are attempting to be run or any import of testdata is attempted and the path doesn't exist yet, we clearly show the user "You need to pip install deid-data" (or similar). Then all the testing logic and paths would stay the same, but the library would not ship by default with the data (but it could be installed, if desired).

I guess it makes sense, sure ! I also saw in the comment "In the future, we can add https endpoints", it may also be a legit solution !

vsoch commented 1 year ago

@RobinFrcd take a look at my idea here: https://github.com/pydicom/deid/pull/227

It's a bit simpler in the sense that deid-data is just an external package that anyone can install, and deid uses it for tests (and someone could arguably use it for their own use case). I think it might be preferable to have this format, primarily so someone can easily install the data on a whim without cloning, and we can cleanly maintain the two projects (https://github.com/pydicom/deid-data). Let me know your thoughts!

RobinFrcd commented 1 year ago

@vsoch Sure, it makes sense. If you don't mind maintaining 2 packages ! Looks good to me :+1:

vsoch commented 1 year ago

@RobinFrcd I have hundreds of projects and like 90 something just on pypi? That's kind of what I do :) :laughing:

vsoch commented 1 year ago

@RobinFrcd I'm deciding to go with the other implementation (a long time contributor also preferred it) but I want to say how grateful I am that you jumped on this! And it was all before I woke up! So thank you hugely and kindly - we won't get to use it but it was a really good idea.

RobinFrcd commented 1 year ago

Sure, no problem ! I'm glad we went from 20MB+ to less than 50kB !! :rocket:

vsoch commented 1 year ago

I know amazing! Lol I feel so badly... should have done this sooner :laughing: