notmatthancock / pylidc

An object relational mapping for the LIDC dataset using sqlalchemy.
https://pylidc.github.io
Other
105 stars 40 forks source link

Can I obtain annotated .dcm file with pylidc? #41

Closed GMBarra closed 3 years ago

GMBarra commented 3 years ago

Hi, I really want to know if it is possible to give the path of a scan as "./LIDC-IDRI/LIDC-IDRI-0001" and return the .dcm that has more than 3 annotations and the .dcm that has 0 annotation, this is to separate the dicoms of a certain patient into "image with nodule" and "image without nodules". I would appreciate if you could get help with this and thanks in advance.

notmatthancock commented 3 years ago

Hmm. You could readily get a listing of DICOM files for a scan having on or more annotation contour like this:

import pylidc as pl

scan = pl.query(pl.Scan).filter(
    pl.Scan.patient_id == 'LIDC-IDRI-0001'
).first()

dicom_files_with_annotation = set(
    c.dicom_file_name
    for a in scan.annotations
    for c in a.contour
)

However, I don't think there is a straight-forward way to get the negative set without searching the file system and taking the complement of the set above. Something like:

import pathlib
dicom_files = set(pathlib.Path(scan.get_path_to_dicom_files()).glob('*.dcm'))
dicom_files_without_annotation = dicom_files - dicom_files_with_annotation
GMBarra commented 3 years ago

I also managed to extract the DICOM files with annotation, but this code is much cleaner than mine. Now I get this error when I try your version:

'Annotation' object has no attribute 'contour'

But I'll play a little more with the library to see if i can get what i want.

notmatthancock commented 3 years ago

Now I get this error when I try your version:

Sorry, it should be plural. A Scan has many Annotations, and an Annotation has many Contours. The line should be:

dicom_files_with_annotation = set(
    c.dicom_file_name
    for a in scan.annotations
    for c in a.contours
)
GMBarra commented 3 years ago

Work fine now, thanks!!