notmatthancock / pylidc

An object relational mapping for the LIDC dataset using sqlalchemy.
https://pylidc.github.io
Other
105 stars 41 forks source link

Fix hidden files in dicom folders #43

Closed hnguyentt closed 3 years ago

hnguyentt commented 3 years ago

Thanks for your useful package. By accident, the hidden files easily appear in Dicom folders like this:

            ._1-014.dcm  ._1-030.dcm  ._1-046.dcm  ._1-062.dcm  ._1-078.dcm  ._1-094.dcm  ._1-110.dcm  ._1-126.dcm  1-013.dcm  1-029.dcm  1-045.dcm  1-061.dcm  1-077.dcm  1-093.dcm  1-109.dcm  1-125.dcm
..           ._1-015.dcm  ._1-031.dcm  ._1-047.dcm  ._1-063.dcm  ._1-079.dcm  ._1-095.dcm  ._1-111.dcm  ._1-127.dcm  1-014.dcm  1-030.dcm  1-046.dcm  1-062.dcm  1-078.dcm  1-094.dcm  1-110.dcm  1-126.dcm
._088.xml    ._1-016.dcm  ._1-032.dcm  ._1-048.dcm  ._1-064.dcm  ._1-080.dcm  ._1-096.dcm  ._1-112.dcm  ._1-128.dcm  1-015.dcm  1-031.dcm  1-047.dcm  1-063.dcm  1-079.dcm  1-095.dcm  1-111.dcm  1-127.dcm
._1-001.dcm  ._1-017.dcm  ._1-033.dcm  ._1-049.dcm  ._1-065.dcm  ._1-081.dcm  ._1-097.dcm  ._1-113.dcm  088.xml      1-016.dcm  1-032.dcm  1-048.dcm  1-064.dcm  1-080.dcm  1-096.dcm  1-112.dcm  1-128.dcm
._1-002.dcm  ._1-018.dcm  ._1-034.dcm  ._1-050.dcm  ._1-066.dcm  ._1-082.dcm  ._1-098.dcm  ._1-114.dcm  1-001.dcm    1-017.dcm  1-033.dcm  1-049.dcm  1-065.dcm  1-081.dcm  1-097.dcm  1-113.dcm
._1-003.dcm  ._1-019.dcm  ._1-035.dcm  ._1-051.dcm  ._1-067.dcm  ._1-083.dcm  ._1-099.dcm  ._1-115.dcm  1-002.dcm    1-018.dcm  1-034.dcm  1-050.dcm  1-066.dcm  1-082.dcm  1-098.dcm  1-114.dcm
._1-004.dcm  ._1-020.dcm  ._1-036.dcm  ._1-052.dcm  ._1-068.dcm  ._1-084.dcm  ._1-100.dcm  ._1-116.dcm  1-003.dcm    1-019.dcm  1-035.dcm  1-051.dcm  1-067.dcm  1-083.dcm  1-099.dcm  1-115.dcm
._1-005.dcm  ._1-021.dcm  ._1-037.dcm  ._1-053.dcm  ._1-069.dcm  ._1-085.dcm  ._1-101.dcm  ._1-117.dcm  1-004.dcm    1-020.dcm  1-036.dcm  1-052.dcm  1-068.dcm  1-084.dcm  1-100.dcm  1-116.dcm
._1-006.dcm  ._1-022.dcm  ._1-038.dcm  ._1-054.dcm  ._1-070.dcm  ._1-086.dcm  ._1-102.dcm  ._1-118.dcm  1-005.dcm    1-021.dcm  1-037.dcm  1-053.dcm  1-069.dcm  1-085.dcm  1-101.dcm  1-117.dcm
._1-007.dcm  ._1-023.dcm  ._1-039.dcm  ._1-055.dcm  ._1-071.dcm  ._1-087.dcm  ._1-103.dcm  ._1-119.dcm  1-006.dcm    1-022.dcm  1-038.dcm  1-054.dcm  1-070.dcm  1-086.dcm  1-102.dcm  1-118.dcm
._1-008.dcm  ._1-024.dcm  ._1-040.dcm  ._1-056.dcm  ._1-072.dcm  ._1-088.dcm  ._1-104.dcm  ._1-120.dcm  1-007.dcm    1-023.dcm  1-039.dcm  1-055.dcm  1-071.dcm  1-087.dcm  1-103.dcm  1-119.dcm
._1-009.dcm  ._1-025.dcm  ._1-041.dcm  ._1-057.dcm  ._1-073.dcm  ._1-089.dcm  ._1-105.dcm  ._1-121.dcm  1-008.dcm    1-024.dcm  1-040.dcm  1-056.dcm  1-072.dcm  1-088.dcm  1-104.dcm  1-120.dcm
._1-010.dcm  ._1-026.dcm  ._1-042.dcm  ._1-058.dcm  ._1-074.dcm  ._1-090.dcm  ._1-106.dcm  ._1-122.dcm  1-009.dcm    1-025.dcm  1-041.dcm  1-057.dcm  1-073.dcm  1-089.dcm  1-105.dcm  1-121.dcm
._1-011.dcm  ._1-027.dcm  ._1-043.dcm  ._1-059.dcm  ._1-075.dcm  ._1-091.dcm  ._1-107.dcm  ._1-123.dcm  1-010.dcm    1-026.dcm  1-042.dcm  1-058.dcm  1-074.dcm  1-090.dcm  1-106.dcm  1-122.dcm
._1-012.dcm  ._1-028.dcm  ._1-044.dcm  ._1-060.dcm  ._1-076.dcm  ._1-092.dcm  ._1-108.dcm  ._1-124.dcm  1-011.dcm    1-027.dcm  1-043.dcm  1-059.dcm  1-075.dcm  1-091.dcm  1-107.dcm  1-123.dcm
._1-013.dcm  ._1-029.dcm  ._1-045.dcm  ._1-061.dcm  ._1-077.dcm  ._1-093.dcm  ._1-109.dcm  ._1-125.dcm  1-012.dcm    1-028.dcm  1-044.dcm  1-060.dcm  1-076.dcm  1-092.dcm  1-108.dcm  1-124.dcm

This accidence will cause an error when loading dicom files to numpy array (scan.to_volume()):

---------------------------------------------------------------------------
InvalidDicomError                         Traceback (most recent call last)
<ipython-input-54-51f193cd4157> in <module>
      1 scan = pl.query(pl.Scan).filter(pl.Scan.patient_id == 'LIDC-IDRI-0118').first()
----> 2 vol = scan.to_volume()
      3 nods = scan.cluster_annotations()
      4 len(nods)

~/miniconda3/envs/deepcoronascan/lib/python3.6/site-packages/pylidc/Scan.py in to_volume(self, verbose)
    637         Return the scan as a 3D numpy array volume.
    638         """
--> 639         images = self.load_all_dicom_images(verbose=verbose)
    640 
    641         volume = np.stack(

~/miniconda3/envs/deepcoronascan/lib/python3.6/site-packages/pylidc/Scan.py in load_all_dicom_images(self, verbose)
    286         if verbose: print("Loading dicom files ... This may take a moment.")
    287 
--> 288         path = self.get_path_to_dicom_files()
    289         fnames = [fname for fname in os.listdir(path)
    290                             if fname.endswith('.dcm')]

~/miniconda3/envs/deepcoronascan/lib/python3.6/site-packages/pylidc/Scan.py in get_path_to_dicom_files(self)
    242                 dicom_file = dicom_file[0]
    243 
--> 244                 dimage = dicom.dcmread(os.path.join(dpath, dicom_file))
    245 
    246                 seid = str(dimage.SeriesInstanceUID).strip()

~/miniconda3/envs/deepcoronascan/lib/python3.6/site-packages/pydicom/filereader.py in dcmread(fp, defer_size, stop_before_pixels, force, specific_tags)
    868     try:
    869         dataset = read_partial(fp, stop_when, defer_size=defer_size,
--> 870                                force=force, specific_tags=specific_tags)
    871     finally:
    872         if not caller_owns_file:

~/miniconda3/envs/deepcoronascan/lib/python3.6/site-packages/pydicom/filereader.py in read_partial(fileobj, stop_when, defer_size, force, specific_tags)
    665 
    666     # Read preamble (if present)
--> 667     preamble = read_preamble(fileobj, force)
    668     # Read any File Meta Information group (0002,eeee) elements (if present)
    669     file_meta_dataset = _read_file_meta_info(fileobj)

~/miniconda3/envs/deepcoronascan/lib/python3.6/site-packages/pydicom/filereader.py in read_preamble(fp, force)
    618         fp.seek(0)
    619     elif magic != b"DICM" and not force:
--> 620         raise InvalidDicomError("File is missing DICOM File Meta Information "
    621                                 "header or the 'DICM' prefix is missing from "
    622                                 "the header. Use force=True to force reading.")

InvalidDicomError: File is missing DICOM File Meta Information header or the 'DICM' prefix is missing from the header. Use force=True to force reading.

This pull request fixed the above mentioned issue.

hnguyentt commented 3 years ago

Thanks for the fix! Out of curiosity, are you using Mac OS? I've never seen these extra dicom files appear, but I've encountered this style of extra blank files in a different context and it turned out that the OS was Mac.

Yes, I am using MacOS. I don't know the purpose of and the existence reasons for these hidden files on MacOS.

Happy new year! :")