pydicom / dicom-cleaner

detection of burned in pixels using OCR (under development)
21 stars 4 forks source link

Add GDCM dependency to Dockerfile #5

Open danielsnider opened 5 years ago

danielsnider commented 5 years ago

My DICOM images seem to be compressed and ask for GDCM. It would be great to add GDCM to the ocr Dockerfile :-).

This is what I see:

root@38a72d18ad9b:/code# ./entrypoint.sh --input /data
DEBUG Found 1 contender files in data
DEBUG Checking 1 dicom files for validation.
Found 1 valid dicom files
Traceback (most recent call last):
  File "/code/main.py", line 172, in <module>
    main()
  File "/code/main.py", line 108, in main
    verbose=args.verbose)
  File "/code/user/__init__.py", line 31, in __init__
    self.image = dicom._get_pixel_array()
  File "/opt/anaconda2/lib/python2.7/site-packages/pydicom-1.0.0a1-py2.7.egg/pydicom/dataset.py", line 1053, in _get_pixel_array
    self._pixel_array = self._pixel_data_numpy()
  File "/opt/anaconda2/lib/python2.7/site-packages/pydicom-1.0.0a1-py2.7.egg/pydicom/dataset.py", line 690, in _pixel_data_numpy
    raise NotImplementedError("Pixel Data is compressed in a "
NotImplementedError: Pixel Data is compressed in a format pydicom does not yet handle. Cannot return array. Pydicom might be able to convert the pixel data using GDCM if it is installed.

Sidenote: Commenting on line main.py#L157... if you catch all exceptions it would be great to at minimum print the exception instead of hiding it. Not seeing the error is the worst kind of error.

vsoch commented 5 years ago

Agree on both fronts! I haven't looked at this repository in a bit, let me see what I can do.

danielsnider commented 5 years ago

It would be great to release a new build to DockerHub!

danielsnider commented 5 years ago

I'm eager to compare your progress with a machine learning approach to my approach with tesseract OCR, computer vision, and python.

vsoch commented 5 years ago

Definitely! I'd like to do this proper with a Docker container deployed via CI, so I'll likely take a day or two to do this. I'll post an update (and ping you on the PR) when I have something to test.

danielsnider commented 5 years ago

I will definitely test it!

Daniel Snider ツ

On Wed, Jan 2, 2019 at 6:16 PM Vanessa Sochat notifications@github.com wrote:

Definitely! I'd like to do this proper with a Docker container deployed via CI, so I'll likely take a day or two to do this. I'll post an update (and ping you on the PR) when I have something to test.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pydicom/dicom-scraper/issues/5#issuecomment-451015119, or mute the thread https://github.com/notifications/unsubscribe-auth/ABqDWAvix3xCjS1PZndgu2TqrIsczx3Sks5u_T3kgaJpZM4ZnWgY .

vsoch commented 5 years ago

Just a heads up, I'm renaming the repo to "dicom-cleaner" because the scraper doesn't as well describe the intended purpose. The Docker images will be built from:

I don't have Github Actions for this organization yet, but if/when we do the deploy will be much easier than relying on an external service.

vsoch commented 5 years ago

And it would be really fun to test these images and develop new methods! I created the tooling a while back and nobody was super interested in a proper testing, so I'm really happy you are ! :D

vsoch commented 5 years ago

hey @danielsnider I'm still figuring out the CI (customizing the orb to build only on PRs and then push on merge to master) but since the container (with gdcm installed via conda) was erroneously pushed to docker hub, if you want to play around with it, here you go! https://cloud.docker.com/u/pydicom/repository/docker/pydicom/dicom-ocr-cleaner

I should have some time to check up on this later, hopefully we will hear back from Circle and then can test the GDCM as well.

vsoch commented 5 years ago

hey @danielsnider the CI is added, see the images being built and pushed to the docker repos I mentioned above here --> https://circleci.com/gh/pydicom/dicom-cleaner

I added the gdcm install to the pydicom/dicom-ocr-cleaner, although I didn't do any testing, etc (the PR was to add the continuous integration). If you'd like to use this container as a start for your testing, with your feedback we can open another PR to work on the software itself.

danielsnider commented 5 years ago

Here we go! I'm trying your new container! Can you look into the error below? I've included the python package versions that are in the container too. It looks like pydicom version 1.0.0a1 actually came through, which I suspect is the problem.

dan@ubuntu:~/dicom-scraper$ sudo docker pull pydicom/dicom-header-cleaner                                          
07fbc26aa5a1: Pull complete 
Digest: sha256:edf9c8f44b0d65c2c013b0de28880d58fc053f62b9e210bfb59fd5729942d1ef
Status: Downloaded newer image for pydicom/dicom-header-cleaner:latest

dan@ubuntu:~/dicom-scraper$ sudo docker run --volume ~/input:/data -it  --entrypoint=/bin/bash  pydicom/dicom-heade
r-cleaner -i
root@f7eaa1bdbc23:/code# ./entrypoint.sh --input /data

Traceback (most recent call last):
  File "/code/main.py", line 180, in <module>
    main()
  File "/code/main.py", line 90, in main
    from deid.dicom import get_files
  File "/usr/local/lib/python3.5/dist-packages/deid/dicom/__init__.py", line 1, in <module>
    from .header import (
  File "/usr/local/lib/python3.5/dist-packages/deid/dicom/header.py", line 31, in <module>
    from .tags import (
  File "/usr/local/lib/python3.5/dist-packages/deid/dicom/tags.py", line 27, in <module>
    from pydicom.tag import tag_in_exception
ImportError: cannot import name 'tag_in_exception'

root@f7eaa1bdbc23:/code# python3.5
Python 3.5.2 (default, Nov 12 2018, 13:43:14) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pydicom.tag import tag_in_exception
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'tag_in_exception'
>>>

root@f7eaa1bdbc23:/code# pip3 freeze
certifi==2018.11.29
chardet==3.0.4
cloudpickle==0.6.1
cycler==0.10.0
dask==1.0.0
decorator==4.3.0
deid==0.1.27
idna==2.8
ipython==2.4.1
kiwisolver==1.0.1
matplotlib==3.0.2
networkx==2.2
numpy==1.15.4
Pillow==5.4.0
pydicom==1.0.0a1
Pygments==2.3.1
pyparsing==2.3.0
python-dateutil==2.7.5
PyWavelets==1.0.1
requests==2.21.0
retrying==1.3.3
scikit-image==0.14.1
scikit-learn==0.15.2
scipy==1.2.0
simplegeneric==0.8.1
simplejson==3.16.0
six==1.12.0
toolz==0.9.0
urllib3==1.21.1
validator.py==1.2.5

Hopefully just a wrong dependency issue on pydicom==1.0.0a1?

vsoch commented 5 years ago

Yep this is actually what I expected! The module needs to be updated for the newest release of pydicom. I'll do this in a new PR and we can continue discussion from there.

danielsnider commented 5 years ago

Greatness!

On Jan 3, 2019, at 7:54 PM, Vanessa Sochat notifications@github.com wrote:

Yep this is actually what I expected! The module needs to be updated for the newest release of pydicom. I'll do this in a new PR and we can continue discussion from there.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

vsoch commented 5 years ago

hey @danielsnider a PR is open with fixes to the header cleaner, see here, and see the last comment for a pushed container you can test.

https://github.com/pydicom/dicom-cleaner/pull/8

I'm not sure how well you can test with your images since the header flagging largely depends on the deid recipe (the filter) criteria used. If your images aren't flagged I'd like to ask you to look at deid.dicom's default recipe -> and comment on what combinations of header fields are being missed! https://github.com/pydicom/deid/blob/master/deid/data/deid.dicom It's a terrible strategy, generally, but the problems with ML approaches is that they take much longer to do.

The ocr image has a bug #9 that likely needs fixing via updating versions. If you have any insight let me know, the original issue is linked and I just don't have time to work on it now.

danielsnider commented 5 years ago

Nice work!! I've made a suggestion for https://github.com/pydicom/dicom-cleaner/issues/9 I'm really interested in the ML approach.

Because I have too much variety in my dicom image set, I'm trying avoid using known PHI coordinates as in the deid.dicom's default recipe. However, I may come back to it.

On Fri, Jan 4, 2019 at 3:57 PM Vanessa Sochat notifications@github.com wrote:

hey @danielsnider https://github.com/danielsnider a PR is open with fixes to the header cleaner, see here, and see the last comment for a pushed container you can test.

8 https://github.com/pydicom/dicom-cleaner/pull/8

I'm not sure how well you can test with your images since the header flagging largely depends on the deid recipe (the filter) criteria used. If your images aren't flagged I'd like to ask you to look at deid.dicom's default recipe -> and comment on what combinations of header fields are being missed! https://github.com/pydicom/deid/blob/master/deid/data/deid.dicom It's a terrible strategy, generally, but the problems with ML approaches is that they take much longer to do.

The ocr image has a bug #9 https://github.com/pydicom/dicom-cleaner/issues/9 that likely needs fixing via updating versions. If you have any insight let me know, the original issue is linked and I just don't have time to work on it now.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydicom/dicom-cleaner/issues/5#issuecomment-451566159, or mute the thread https://github.com/notifications/unsubscribe-auth/ABqDWCIUNH4tNTnxODaZEgiUWs12XnORks5u_8AlgaJpZM4ZnWgY .

danielsnider commented 5 years ago

Actually, I will try the header cleaner! If it can clean a percentage of my images, that's a win! I'll do a quick "does it run" test right now. Daniel Snider ツ

On Fri, Jan 4, 2019 at 4:17 PM Daniel Snider danielsnider12@gmail.com wrote:

Nice work!! I've made a suggestion for https://github.com/pydicom/dicom-cleaner/issues/9 I'm really interested in the ML approach.

Because I have too much variety in my dicom image set, I'm trying avoid using known PHI coordinates as in the deid.dicom's default recipe. However, I may come back to it.

On Fri, Jan 4, 2019 at 3:57 PM Vanessa Sochat notifications@github.com wrote:

hey @danielsnider https://github.com/danielsnider a PR is open with fixes to the header cleaner, see here, and see the last comment for a pushed container you can test.

8 https://github.com/pydicom/dicom-cleaner/pull/8

I'm not sure how well you can test with your images since the header flagging largely depends on the deid recipe (the filter) criteria used. If your images aren't flagged I'd like to ask you to look at deid.dicom's default recipe -> and comment on what combinations of header fields are being missed! https://github.com/pydicom/deid/blob/master/deid/data/deid.dicom It's a terrible strategy, generally, but the problems with ML approaches is that they take much longer to do.

The ocr image has a bug #9 https://github.com/pydicom/dicom-cleaner/issues/9 that likely needs fixing via updating versions. If you have any insight let me know, the original issue is linked and I just don't have time to work on it now.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydicom/dicom-cleaner/issues/5#issuecomment-451566159, or mute the thread https://github.com/notifications/unsubscribe-auth/ABqDWCIUNH4tNTnxODaZEgiUWs12XnORks5u_8AlgaJpZM4ZnWgY .

vsoch commented 5 years ago

I'm on the same page - the "hardcoded" memorize patterns in headers to find pixels just doesn't have feet!

The image is rebuilding now - I'll let you know how it goes after testing!