reginabarzilaygroup / Sybil

Deep Learning for Lung Cancer Risk Prediction using LDCT
MIT License
67 stars 39 forks source link

how to train sybil on another small dataset #49

Closed Malaikah-Javed closed 1 month ago

Malaikah-Javed commented 1 month ago

Can we train Sybil on another dataset? Not NLST, MGH or CGMH. In sybil/utils/helper.py, SUPPORTED DATASETS are mentioned. Could anyone tell me how to train it on another dataset(if we can)? And do we convert the DICOM images to PNG before training? if so, how to convert all of them in one script

And what if we do not have the metadata json/csv files like the NLST dataset

pgmikhael commented 1 month ago

Hi,

Thanks for reaching out.

Yes, you should be able to train on a different dataset. The details for training should be available under the train branch. This includes information regarding conversion of DICOMs.

For a different dataset, you should make a new dataset object by following the NLST and MGH dataset objects (see corresponding scripts under sybil/datasets/). The metadata you have should at the very least include, for each exam, the paths to the images, the label (cancer or no cancer with X years), and the censoring data (years to last negative followup or years to cancer diagnosis, whichever comes first).

Malaikah-Javed commented 1 month ago

@pgmikhael Thanks a lot for your help. I will look into that.

I must be overlooking the information about conversion of DICOMs, could you please specify where I might find it?

Also, in the Parsing.py file, it is mentioned that input images can can PNG and DICOM but default is PNG. So does that mean training images can be DICOM?

pgmikhael commented 1 month ago

Hi,

Sure, the conversion is detailed here.

We implemented methods to accept both file types, but we only ever trained using the generated PNGs.

liujun0621 commented 1 month ago

excuse me,can you share the NLST dataset json file?

Malaikah-Javed commented 1 month ago

Thank you! @pgmikhael

pgmikhael commented 1 month ago

@liujun0621

We can't provide the dataset JSON file since it was created under a data agreement at the time (before NLST was archived). You should be able to create it as described in the train docs.