Data labels question - Githubissues

rlsn / LungNoduleDetection

Detect and locate lung nodules from CT images with deep learning

MIT License

5 stars 0 forks source link

Data labels question #1

Open junxiant opened 3 months ago

junxiant commented 3 months ago

Hello, I would like to test out this code, I see that in the model_config.json, it has "num_labels": 1. Does this mean the model was only trained on the "nodules" class?

In the dataset.py line 223 i see that it crops both a nodule else it crops a non-nodule (negative) patch, is that correct?

If that is the case, should i be changing the num_labels to 2, to train it on both "nodule" and "non-nodules"?

rlsn commented 3 months ago

Hello,

Thank you for your question. Yes, the model is trained to classify whether the cropped patch contains a nodule or not. As we formulate the problem as a binary classification task, a single logit output is sufficient, with binary cross-entropy loss. Setting 'num_labels'=2 would utilize multi-label cross-entropy loss, which works equivalently well. Please refer to model.py for details.

junxiant commented 3 months ago

Thanks, I have another question regarding inference for a single scan. Since the dataset class currently crops nodules and non-nodules based on the annotations, would a separate dataset class be required so that the single scan can be cropped fully without needing ground truth labels, so it can be used as inputs to the trained model?

rlsn commented 3 months ago

Indeed, that's a more unbiased method for evaluation. I plan to add that and adopt the evaluation metric used in the luna16 challenge in future updates.

junxiant commented 3 months ago

Sure, something like a demo would work as well. I'm looking at the codes so i'll see how i can contribute to this.

junxiant commented 2 months ago

hello,

i am looking at the dataset.py code at line 223

if len(bboxes)>0 and np.random.rand()<0.5:

In this case, if there is a nodule located but the prob is < 0.5, it won't crop the positive nodule. Will it be better to set it such that it always crops out the positive nodule, and then additionally crop out a negative patch as well?

rlsn commented 2 months ago

Hello, the goal is essentially to produce a balanced dataset. I guess the sampling may add some unnecessary overheads but does its job well.

Please feel free to make any improvements. Thanks!