Open danielsnider opened 5 years ago
Agree on both fronts! I haven't looked at this repository in a bit, let me see what I can do.
It would be great to release a new build to DockerHub!
I'm eager to compare your progress with a machine learning approach to my approach with tesseract OCR, computer vision, and python.
Definitely! I'd like to do this proper with a Docker container deployed via CI, so I'll likely take a day or two to do this. I'll post an update (and ping you on the PR) when I have something to test.
I will definitely test it!
Daniel Snider ツ
On Wed, Jan 2, 2019 at 6:16 PM Vanessa Sochat notifications@github.com wrote:
Definitely! I'd like to do this proper with a Docker container deployed via CI, so I'll likely take a day or two to do this. I'll post an update (and ping you on the PR) when I have something to test.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pydicom/dicom-scraper/issues/5#issuecomment-451015119, or mute the thread https://github.com/notifications/unsubscribe-auth/ABqDWAvix3xCjS1PZndgu2TqrIsczx3Sks5u_T3kgaJpZM4ZnWgY .
Just a heads up, I'm renaming the repo to "dicom-cleaner" because the scraper doesn't as well describe the intended purpose. The Docker images will be built from:
I don't have Github Actions for this organization yet, but if/when we do the deploy will be much easier than relying on an external service.
And it would be really fun to test these images and develop new methods! I created the tooling a while back and nobody was super interested in a proper testing, so I'm really happy you are ! :D
hey @danielsnider I'm still figuring out the CI (customizing the orb to build only on PRs and then push on merge to master) but since the container (with gdcm installed via conda) was erroneously pushed to docker hub, if you want to play around with it, here you go! https://cloud.docker.com/u/pydicom/repository/docker/pydicom/dicom-ocr-cleaner
I should have some time to check up on this later, hopefully we will hear back from Circle and then can test the GDCM as well.
hey @danielsnider the CI is added, see the images being built and pushed to the docker repos I mentioned above here --> https://circleci.com/gh/pydicom/dicom-cleaner
I added the gdcm install to the pydicom/dicom-ocr-cleaner, although I didn't do any testing, etc (the PR was to add the continuous integration). If you'd like to use this container as a start for your testing, with your feedback we can open another PR to work on the software itself.
Here we go! I'm trying your new container! Can you look into the error below? I've included the python package versions that are in the container too. It looks like pydicom version 1.0.0a1
actually came through, which I suspect is the problem.
dan@ubuntu:~/dicom-scraper$ sudo docker pull pydicom/dicom-header-cleaner
07fbc26aa5a1: Pull complete
Digest: sha256:edf9c8f44b0d65c2c013b0de28880d58fc053f62b9e210bfb59fd5729942d1ef
Status: Downloaded newer image for pydicom/dicom-header-cleaner:latest
dan@ubuntu:~/dicom-scraper$ sudo docker run --volume ~/input:/data -it --entrypoint=/bin/bash pydicom/dicom-heade
r-cleaner -i
root@f7eaa1bdbc23:/code# ./entrypoint.sh --input /data
Traceback (most recent call last):
File "/code/main.py", line 180, in <module>
main()
File "/code/main.py", line 90, in main
from deid.dicom import get_files
File "/usr/local/lib/python3.5/dist-packages/deid/dicom/__init__.py", line 1, in <module>
from .header import (
File "/usr/local/lib/python3.5/dist-packages/deid/dicom/header.py", line 31, in <module>
from .tags import (
File "/usr/local/lib/python3.5/dist-packages/deid/dicom/tags.py", line 27, in <module>
from pydicom.tag import tag_in_exception
ImportError: cannot import name 'tag_in_exception'
root@f7eaa1bdbc23:/code# python3.5
Python 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pydicom.tag import tag_in_exception
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name 'tag_in_exception'
>>>
root@f7eaa1bdbc23:/code# pip3 freeze
certifi==2018.11.29
chardet==3.0.4
cloudpickle==0.6.1
cycler==0.10.0
dask==1.0.0
decorator==4.3.0
deid==0.1.27
idna==2.8
ipython==2.4.1
kiwisolver==1.0.1
matplotlib==3.0.2
networkx==2.2
numpy==1.15.4
Pillow==5.4.0
pydicom==1.0.0a1
Pygments==2.3.1
pyparsing==2.3.0
python-dateutil==2.7.5
PyWavelets==1.0.1
requests==2.21.0
retrying==1.3.3
scikit-image==0.14.1
scikit-learn==0.15.2
scipy==1.2.0
simplegeneric==0.8.1
simplejson==3.16.0
six==1.12.0
toolz==0.9.0
urllib3==1.21.1
validator.py==1.2.5
Hopefully just a wrong dependency issue on pydicom==1.0.0a1
?
Yep this is actually what I expected! The module needs to be updated for the newest release of pydicom. I'll do this in a new PR and we can continue discussion from there.
Greatness!
On Jan 3, 2019, at 7:54 PM, Vanessa Sochat notifications@github.com wrote:
Yep this is actually what I expected! The module needs to be updated for the newest release of pydicom. I'll do this in a new PR and we can continue discussion from there.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
hey @danielsnider a PR is open with fixes to the header cleaner, see here, and see the last comment for a pushed container you can test.
https://github.com/pydicom/dicom-cleaner/pull/8
I'm not sure how well you can test with your images since the header flagging largely depends on the deid recipe (the filter) criteria used. If your images aren't flagged I'd like to ask you to look at deid.dicom's default recipe -> and comment on what combinations of header fields are being missed! https://github.com/pydicom/deid/blob/master/deid/data/deid.dicom It's a terrible strategy, generally, but the problems with ML approaches is that they take much longer to do.
The ocr image has a bug #9 that likely needs fixing via updating versions. If you have any insight let me know, the original issue is linked and I just don't have time to work on it now.
Nice work!! I've made a suggestion for https://github.com/pydicom/dicom-cleaner/issues/9 I'm really interested in the ML approach.
Because I have too much variety in my dicom image set, I'm trying avoid using known PHI coordinates as in the deid.dicom's default recipe. However, I may come back to it.
On Fri, Jan 4, 2019 at 3:57 PM Vanessa Sochat notifications@github.com wrote:
hey @danielsnider https://github.com/danielsnider a PR is open with fixes to the header cleaner, see here, and see the last comment for a pushed container you can test.
8 https://github.com/pydicom/dicom-cleaner/pull/8
I'm not sure how well you can test with your images since the header flagging largely depends on the deid recipe (the filter) criteria used. If your images aren't flagged I'd like to ask you to look at deid.dicom's default recipe -> and comment on what combinations of header fields are being missed! https://github.com/pydicom/deid/blob/master/deid/data/deid.dicom It's a terrible strategy, generally, but the problems with ML approaches is that they take much longer to do.
The ocr image has a bug #9 https://github.com/pydicom/dicom-cleaner/issues/9 that likely needs fixing via updating versions. If you have any insight let me know, the original issue is linked and I just don't have time to work on it now.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydicom/dicom-cleaner/issues/5#issuecomment-451566159, or mute the thread https://github.com/notifications/unsubscribe-auth/ABqDWCIUNH4tNTnxODaZEgiUWs12XnORks5u_8AlgaJpZM4ZnWgY .
Actually, I will try the header cleaner! If it can clean a percentage of my images, that's a win! I'll do a quick "does it run" test right now. Daniel Snider ツ
On Fri, Jan 4, 2019 at 4:17 PM Daniel Snider danielsnider12@gmail.com wrote:
Nice work!! I've made a suggestion for https://github.com/pydicom/dicom-cleaner/issues/9 I'm really interested in the ML approach.
Because I have too much variety in my dicom image set, I'm trying avoid using known PHI coordinates as in the deid.dicom's default recipe. However, I may come back to it.
On Fri, Jan 4, 2019 at 3:57 PM Vanessa Sochat notifications@github.com wrote:
hey @danielsnider https://github.com/danielsnider a PR is open with fixes to the header cleaner, see here, and see the last comment for a pushed container you can test.
8 https://github.com/pydicom/dicom-cleaner/pull/8
I'm not sure how well you can test with your images since the header flagging largely depends on the deid recipe (the filter) criteria used. If your images aren't flagged I'd like to ask you to look at deid.dicom's default recipe -> and comment on what combinations of header fields are being missed! https://github.com/pydicom/deid/blob/master/deid/data/deid.dicom It's a terrible strategy, generally, but the problems with ML approaches is that they take much longer to do.
The ocr image has a bug #9 https://github.com/pydicom/dicom-cleaner/issues/9 that likely needs fixing via updating versions. If you have any insight let me know, the original issue is linked and I just don't have time to work on it now.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydicom/dicom-cleaner/issues/5#issuecomment-451566159, or mute the thread https://github.com/notifications/unsubscribe-auth/ABqDWCIUNH4tNTnxODaZEgiUWs12XnORks5u_8AlgaJpZM4ZnWgY .
I'm on the same page - the "hardcoded" memorize patterns in headers to find pixels just doesn't have feet!
The image is rebuilding now - I'll let you know how it goes after testing!
My DICOM images seem to be compressed and ask for
GDCM
. It would be great to addGDCM
to the ocr Dockerfile :-).This is what I see:
Sidenote: Commenting on line main.py#L157... if you catch all exceptions it would be great to at minimum print the exception instead of hiding it. Not seeing the error is the worst kind of error.