ncoudray / DeepPATH

Classification of Lung cancer slide images using deep-learning
489 stars 212 forks source link

Datasplits #89

Closed gabrieldernbach closed 2 years ago

gabrieldernbach commented 2 years ago

In section 3 of the TCGA example the data is split is created via

python ../00_preprocessing/0d_SortTiles.py --SourceFolder='../512px_Tiled_NewPortal/'  --Magnification=20  --MagDiffAllowed=0 --SortingOption=10  --PatientID=-1 --PercentTest=15 --PercentValid=15 --nSplit 0 --outFilenameStats='../r2_test/test_69000k/out_filename_Stats.txt'

Is this the split that has been used in the paper? The script can technically be run without the outFilenameStats, does this change the outcome of the split?

ncoudray commented 2 years ago

Hi Gabriel,

The example is not exactly like in the paper, but pretty close. The code has evolved with more options as well. The "outFilenameStats" is a filter (now used with the "expLabel" flag and optionally the "threshold" flag) that allows user to select a subset of tiles which were assigned a certain probability by a previous classifier (in this example, select tiles classified as LUAD for example). You can check the main page for more info regarding the options.

Best, N.

gabrieldernbach commented 2 years ago

Hi Nicolas, thanks for confirmation. Using your preprocessing/split as input I could verify your reported results on lung_lusc_normal as well as EGFR prediction with an independent training script and different model.

The full requirements.txt gave me some trouble, so maybe this container is of help to others, too

# Dockerfile
FROM ubuntu:18.04

ENV DEBIAN_FRONTEND=noninteractive

# dependencies for python 3.6.5
RUN apt-get update
RUN apt-get install software-properties-common -y
RUN apt-add-repository multiverse -y
RUN apt-get install build-essential checkinstall  -y
RUN apt-get install libreadline-gplv2-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev liblzma-dev  -y
RUN apt-get install wget -y

# compile python 3.6.5
RUN wget -P /tmp/ https://www.python.org/ftp/python/3.6.5/Python-3.6.5.tgz
RUN tar -xvf /tmp/Python-3.6.5.tgz -C /tmp/
RUN cd /tmp/Python-3.6.5/ \
        && ./configure \
        && make \
        && checkinstall
RUN pip3 install --upgrade pip

RUN apt-get install openslide-tools -y
# requirements (small)
COPY requirements.txt /tmp/
RUN pip3 install --requirement /tmp/requirements.txt
# requirements.txt
tensorflow==1.9.0
numpy==1.14.3
matplotlib==2.1.2
scikit-learn==0.23.1
scipy==1.1.0
openslide-python==1.1.1
Pillow==5.1.0
dicom==0.9.9.post1
imageio==2.8.0
scikit-image==0.17.2 
ncoudray commented 2 years ago

That's great, thanks Gabriel!

I'll mention this in the readme file then as well.