stanfordmlgroup / chexpert-labeler

CheXpert NLP tool to extract observations from radiology reports.
MIT License
328 stars 78 forks source link

Add dockerized runtime #32

Closed jantrienes closed 2 years ago

jantrienes commented 2 years ago

I found it relatively hard to install the dependencies on a non-linux system. This PR adds a dockerized runtime which hopefully makes it easier to get started with chexpert-labeler.

Testing done:

$ docker build -t chexpert-labeler:latest .
[+] Building 24.4s (17/17) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                                                          0.0s
 => => transferring dockerfile: 804B                                                                                                                                                                          0.0s
 => [internal] load .dockerignore                                                                                                                                                                             0.0s
 => => transferring context: 142B                                                                                                                                                                             0.0s
 => [internal] load metadata for docker.io/continuumio/miniconda:latest                                                                                                                                       0.0s
 => [ 1/12] FROM docker.io/continuumio/miniconda                                                                                                                                                              0.0s
 => [internal] load build context                                                                                                                                                                             0.0s
 => => transferring context: 42.84kB                                                                                                                                                                          0.0s
 => CACHED [ 2/12] RUN apt-get update --allow-releaseinfo-change                                                                                                                                              0.0s
 => CACHED [ 3/12] RUN mkdir -p /usr/share/man/man1                                                                                                                                                           0.0s
 => CACHED [ 4/12] RUN apt-get install -y default-jre                                                                                                                                                         0.0s
 => CACHED [ 5/12] WORKDIR /app/chexpert-labeler                                                                                                                                                              0.0s
 => CACHED [ 6/12] RUN git clone https://github.com/ncbi-nlp/NegBio.git                                                                                                                                       0.0s
 => CACHED [ 7/12] COPY environment.yml environment.yml                                                                                                                                                       0.0s
 => CACHED [ 8/12] RUN conda env create -f environment.yml                                                                                                                                                    0.0s
 => [ 9/12] COPY . .                                                                                                                                                                                          0.0s
 => [10/12] RUN chmod +x ./entrypoint.sh                                                                                                                                                                      0.2s
 => [11/12] RUN ./entrypoint.sh python -m nltk.downloader universal_tagset punkt wordnet                                                                                                                      7.6s
 => [12/12] RUN ./entrypoint.sh python -c "from bllipparser import RerankingParser; RerankingParser.fetch_and_load('GENIA+PubMed')"                                                                          15.6s
 => exporting to image                                                                                                                                                                                        0.8s
 => => exporting layers                                                                                                                                                                                       0.7s
 => => writing image sha256:0e836ce35a8afff3924177b9f182635c01a1b142bcd383e2d4df4a529451363d                                                                                                                  0.0s
 => => naming to docker.io/library/chexpert-labeler:latest

$ docker run -v $(pwd):/data chexpert-labeler:latest python label.py --reports_path /data/sample_reports.csv --output_path /data/labeled_reports_docker.csv
Generating LALR tables
Downloading 'http://search.maven.org/remotecontent?filepath=edu/stanford/nlp/stanford-corenlp/3.5.2/stanford-corenlp-3.5.2.jar' -> '/root/.local/share/pystanforddeps/stanford-corenlp-3.5.2.jar'

$ diff labeled_reports.csv labeled_reports_docker.csv && echo "equal"
equal