Closed gabrieldernbach closed 2 years ago
Hi Gabriel,
The example is not exactly like in the paper, but pretty close. The code has evolved with more options as well. The "outFilenameStats" is a filter (now used with the "expLabel" flag and optionally the "threshold" flag) that allows user to select a subset of tiles which were assigned a certain probability by a previous classifier (in this example, select tiles classified as LUAD for example). You can check the main page for more info regarding the options.
Best, N.
Hi Nicolas, thanks for confirmation. Using your preprocessing/split as input I could verify your reported results on lung_lusc_normal as well as EGFR prediction with an independent training script and different model.
The full requirements.txt gave me some trouble, so maybe this container is of help to others, too
# Dockerfile
FROM ubuntu:18.04
ENV DEBIAN_FRONTEND=noninteractive
# dependencies for python 3.6.5
RUN apt-get update
RUN apt-get install software-properties-common -y
RUN apt-add-repository multiverse -y
RUN apt-get install build-essential checkinstall -y
RUN apt-get install libreadline-gplv2-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev liblzma-dev -y
RUN apt-get install wget -y
# compile python 3.6.5
RUN wget -P /tmp/ https://www.python.org/ftp/python/3.6.5/Python-3.6.5.tgz
RUN tar -xvf /tmp/Python-3.6.5.tgz -C /tmp/
RUN cd /tmp/Python-3.6.5/ \
&& ./configure \
&& make \
&& checkinstall
RUN pip3 install --upgrade pip
RUN apt-get install openslide-tools -y
# requirements (small)
COPY requirements.txt /tmp/
RUN pip3 install --requirement /tmp/requirements.txt
# requirements.txt
tensorflow==1.9.0
numpy==1.14.3
matplotlib==2.1.2
scikit-learn==0.23.1
scipy==1.1.0
openslide-python==1.1.1
Pillow==5.1.0
dicom==0.9.9.post1
imageio==2.8.0
scikit-image==0.17.2
That's great, thanks Gabriel!
I'll mention this in the readme file then as well.
In section 3 of the TCGA example the data is split is created via
Is this the split that has been used in the paper? The script can technically be run without the
outFilenameStats
, does this change the outcome of the split?