reginabarzilaygroup / Sybil

Deep Learning for Lung Cancer Risk Prediction using LDCT
MIT License
67 stars 39 forks source link

Same scores for two different images #36

Closed surajraj99 closed 3 weeks ago

surajraj99 commented 7 months ago

Hello,

We are running into an issue with running Sybil. We are using the given code on the readme and trying to get scores for some test dicoms that you have provided in your data folder (Sybil demo_data). The exact dicoms are:

1-2da413541bb2518fb0f8c583900999ef 194-89970e1e7ba1759f86babb310a2c04e9

For some reason, we get the same score for these two different dicoms. We tested to see if this was the case with both Sybil_1 and Sybil_2. Sybil_1 and Sybil_2 provide different numbers. But Sybil_1 provides the same scores for both dicom 1 and dicom 2. Same thing with Sybil_2.

Specifically, the score for Sybil_1 are: [0.005670641576314818, 0.016728911619303625, 0.040977454787905605, 0.05335478429725567, 0.06768990118864318, 0.10217879263786658]

Scores for Sybil_2 for both dicoms are: [0.007401745617755098, 0.01943424256123729, 0.0336564680065982, 0.046328010497170294, 0.057618836662999294, 0.08531938437897854]

We looked to see if the issue persisted with an external dcm dataset found here: https://www.kaggle.com/datasets/ymirsky/medical-deepfakes-lung-cancer?resource=download&select=labels_exp1.csv

Same issue. We used from CT_Scans/EXP1_blind/1003/0.dcm and CT_Scans/EXP1_blind/1546/159.dcm. The scores we got for both of these from Sybil_1:

[0.011891088336936041, 0.025743208030524028, 0.05334339990528849, 0.05963512647876064, 0.07540808448822184, 0.10650834286905617]

Here is the code that we are using:

from sybil import Serie, Sybil
model = Sybil("sybil_2")
serie = Serie(['Test.dcm'])
scores = model.predict([serie])
print(scores)

We have also attached the two dicom images (within zip file) from the Sybil demo data we used. test_dicoms.zip

Please assist us, thank you!

jsilter commented 6 months ago

Sybil is designed to take a full set of DICOM images as input, as produced by a CT scan. The results with just a single DICOM image won't be meaningful. So you'd want something like (for example):

image_dir = "sybil_demo_data"  #  or maybe "CT_Scans/EXP1_blind/1546" 
input_files = os.listdir(image_dir)
serie = Serie(input_files)