Closed markloyman closed 7 years ago
Thanks for the bug report.
When I wrote the code to populate the sqlite database for this library, I assumed that the file names would always be the same. Under this assumption, I hard-coded an attribute to the Scan
object, sorted_dicom_file_names
, in order to eliminate the sort step from the DICOM loading function. Also, some scans are weird in that they have what appears to be duplicate slices with the same z-index. So the hard-coded attribute eliminated the need to sort the data every time as well as "prune" the duplicate slices if they exist.
It looks like hard-coding this was a bad idea retrospectively, but I think we can fix it by making the load_all_dicom_files
function more general by loading and sorting on-the-fly.
Will you replace the load_all_dicom_images
function with the following (in Scan.py
) and let me know how it effects your issue?
def load_all_dicom_images(self, verbose=True):
"""
....
"""
if verbose: print("Loading dicom files ... This may take a moment.")
path = self.get_path_to_dicom_files()
fnames = [fname for fname in os.listdir(path)
if fname.endswith('.dcm')]
images = []
for fname in fnames:
with open(os.path.join(path, fname), 'rb') as f:
image = dicom.read_file(f)
images.append(image)
# ##############################################
# Clean multiple z scans.
#
# Some scans contain multiple slices with the same `z` coordinate
# from the `ImagePositionPatient` tag.
# The arbitrary choice to take the slice with lesser
# `InstanceNumber` tag is made.
# This takes some work to accomplish...
zs = [float(img.ImagePositionPatient[-1]) for img in images]
inums = [float(img.InstanceNumber) for img in images]
inds = range(len(zs))
while np.unique(zs).shape[0] != len(inds):
for i in inds:
for j in inds:
if i!=j and zs[i] == zs[j]:
k = i if inums[i] > inums[j] else j
inds.pop(inds.index(k))
# Prune the duplicates found in the loops above.
zs = [zs[i] for i in range(len(zs)) if i in inds]
dcm_file_paths = [fnames[i] for i in range(len(fnames)) if i in inds]
dcm_imgs = [images[i] for i in range(len(images)) if i in inds]
# Sort everything by (now unique) ImagePositionPatient z coordinate.
sort_inds = np.argsort(zs)
images = [images[s] for s in sort_inds]
# End multiple z clean.
# ##############################################
return images
Hi, thanks for the quick solution. :)
I've tested it on a couple of instances, and it seems to work great. Now, I'm lauching my original code, that cycles all annotation.
I will update later on whether there were any unexpected complications.
Successfully read all nodule data.
Thank you. pylidc has been a tremendous help for me.
Ok, glad to hear the fix appears to be working and that the library has been useful to you.
I think there's still a bug with the code above, which deals with the case where there may be duplicate z-index slices, specically, the line,
dcm_imgs = [images[i] for i in range(len(images)) if i in inds]
should be changed to,
images = [images[i] for i in range(len(images)) if i in inds]
and the line preceding it can be removed.
These lines deal with the scans that contain duplicate z-slices. The code won't error as you found, but you might (or not?) get weird results, otherwise.
I'll have to double check by visual inspection that the code is working correctly for the "duplicate z" cases. If this code handles those cases correctly, I will add this fix to the next version to be released on pip.
Ok mark, the fix on the latest pip version, so you can grab it by pip install --upgrade pylidc
.
Well, apparently I didn't read all nodule data. Just tried to re-run my code and I encountered a problem with duplicates pruning:
in load_all_dicom_images
inds.pop(inds.index(k))
AttributeError: 'range' object has no attribute 'pop'
inds is a range, which in python 3 is an iterator, so you can't modify it. Simple fix by changing the initialization to inds = list(range(len(zs)))
.
Thanks. I've added the fix to latest pip version.
This issue is sort of a follow up for the problem that was mentioned in pull-request https://github.com/pylidc/pylidc/pull/5 (similiar to issue https://github.com/pylidc/pylidc/issues/2)
In short: Some of the .dcm files are missing (not only 0000.dcm), which results in an error.
I checked with the LIDC online access to make sure that the problem isn't the result of failed downloads:
1009
, where all files are present, but the first index is 3. Or patient0777
, where all 187 files are present but start at number 5.0048
, the missing file (298) is the last one, but the total number of files is again consistant with the online dataset (297).At the time, it seemed to me that some of the missing files were in the middle. However, I don't see any such cases in my log.
Code:
scan= pl.query(pl.Scan).filter(pl.Scan.series_instance_uid == '1.3.6.1.4.1.14519.5.2.1.6279.6001.250863365157630276148828903732' ).first() scan.annotations[0].uniform_cubic_resample(side_length = 100)
Error:
Full list of missing files that I encountered: