ncoudray / DeepPATH

Classification of Lung cancer slide images using deep-learning
492 stars 213 forks source link

Some questions about mutations classifier and LUAD vs LUSC classifier #28

Closed RijunLiao closed 5 years ago

RijunLiao commented 5 years ago

Dear authors, many thanks for your great work, which has made a great contribution to society. Could I ask some questions about mutations classifiers? Thank you!

1) I want to recreate the classifier of mutation in your paper(Table 1). What is the option of "--SortingOption" in the step of "0.2 Sort the tiles"? 2) In the step of "0.3b Convert the JPEG", why training and validation sets are in the same output directory? It is different from the 2 or 3 classes jobs. Should I separate them into two different folders before training? 3) What does the mean of the micro and macro-average in your paper(Table 1)?

LUAD vs LUSC classifier problem
1) I set the "--SortingOption=4 Sort according to type of cancer (LUSC, LUAD)" in the step of "0.2 Sort the tiles". After training with Inception v3 fully-trained, about 500000 batches to run, but the AUC is only 0.86, which is much lower than your paper(Table 1).

While I set the "--SortingOption=3 Sort according to type of cancer (LUSC, LUAD, or Nomal Tissue)" and remove the Nomal Tissue dataset, and train the same way. Finnaly the AUC can achieve 0.956. Do you know why this happen? Thank you!

ncoudray commented 5 years ago

About the mutations:

  1. There are several ways of doing it. Let's assume there are several mutations that can occur at the same time, and therefore that the sigmoid will be used. I use option 14 such as:

    • I usually have the list of mutations in a text file (first column is patient ID, second is mutation - If a patient has several mutations, it will have several lines, if it does not have any mutations at all, it will have "WT" in the second columns).
    • If you want to only do mutations on tiles labeled as LUAD from LUAD slides by the 1st classifier:
    • the patient IDs must have only LUAD slides
    • To make sure the "normal" slides are not included, use the first 14 digits (TCGA-05-4244-0 for example - it usually ends with "1" for normal slides and 0 for tumor ones)
    • use 0.3b from github to convert from jpg to TFRecord ("labels" being the file mentioned above where 1st columns are IDs and second is mutations).
  2. You don't need to separate them because when you run the train and validation, filters on the keywords are used. Also, originally, for the validation, we used the default inception script that computes the "precision" and should be run at the same time as the training.

  3. see https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html

Regarding the last question, I don't know.

Best

RijunLiao commented 5 years ago

Sorry for delay, many thanks for your help.