How do I evaluate a single TFRecord to determine the most likely label?

ncoudray / DeepPATH

Classification of Lung cancer slide images using deep-learning

492 stars 213 forks source link

How do I evaluate a single TFRecord to determine the most likely label? #65

Closed afrankel closed 4 years ago

afrankel commented 4 years ago

I'm at the point where I have the models (checkpoints) and an AUC that shows me which is the best checkpoint of the model. Let's say I just have 3 labels: normal, luad, and lusc and I have a new slide (svs file). Assuming I get that svs file to a TFRecord, how can I just evaluate the most likely label based on my model? i.e., I would like to run a script that points to my checkpoint, those three labels, and the directory of my TFRecord and produce a simple result with normal, luad, OR lusc (with perhaps a percentage likelihood).

(As an aside is there a better place to ask more beginner questions like these, e.g., stackoverflow?)

afrankel commented 4 years ago

Trying to delete this issue since I found my answer in the README

afrankel commented 4 years ago

Sorry, thought I understood this from the README, but I guess I didn't. Can you share how you can do a basic prediction given my initial comment?

ncoudray commented 4 years ago

I'm not sure I understand your question. Can you please clarify? Test output of the test run should give you that info, but I'm not sure what you mean/need.

afrankel commented 4 years ago

Well, I believe the test output does give me what I want, but maybe you can clarify. In this particular case, let's say I have 2 labels (LABEL1, LABEL2) and the output of out2_perSlideStats.txt is:

valid_01152_snap1_010 true_label: [1.0, 0.0] Percent_Selected: 0.000000
1.000000 Average_Probability: 0.049507 0.950493 tiles#: 400.0000 00

I assume this means that the TFRecord had internally listed this image as LABEL1 (because of the 1 1.0, 0.0 after "true_label" signifying the first label) and the Average_Probability of 0.049507 0.950493 signifies that there is only a 5% prediction of LABEL1 (vs 95% of LABEL2) based on the model I used.

Do I have this all correctly?
is there a "better" file I should look at for this information?
if I didn't know the outcome - i.e., I just have the slide with no diagnosis and want to determine the prediction of each label, should I just "trick" the TFRecord with a diagnosis and go through the normal process or is there a better way?

Thank you!

ncoudray commented 4 years ago

yes you have it all correctly. Everything is summarize in the out1 (per tile), out2 (per slide) and out3_ (per patient) files. Ans yes, if you don't know the outcome, right now, the easiest way is to just assign a random label (in any case, you would still need to tile and convert to TFRecord - but in the conversion, it checked how many possible classes exist, so you would just put your jpg in one folder, and them, if you have a total of 2 classes possible, just create a second dummy empty folder), then in the end just just ignore the random label and just retrieve the probability you're interested in.

afrankel commented 4 years ago

Thank you!