notmatthancock / pylidc

An object relational mapping for the LIDC dataset using sqlalchemy.
https://pylidc.github.io
Other
105 stars 41 forks source link

parameterize populate.py script and fix pydicom import #12

Closed fedorov closed 11 months ago

fedorov commented 6 years ago

Instead of using hard-coded paths locations, reuse those from settings. While doing that, added XML annotations location to the configuration. It might make sense to move the getSettings() function to the util class. It might also be good to add the path to the sqlite database to the configuration, using the installed one by default.

Fixed import dicom, which was removed in pydicom 1.0 - now this is handeled more gracefully by catching ImportError.

fedorov commented 6 years ago

As another motivation for revisiting the process of generating the database and providing documentation, note that the TCIA content is not frozen and can be updated. If the content on TCIA has been updated, but the database in pypi or in the repo has not, for one reason or another, it makes sense to me to help the user of the code to re-generate the database.

Note XML content of LIDC-IDRI has been updated in May and June 2018.

image

notmatthancock commented 6 years ago

Excellent point @fedorov possibly ever-changing XML. I will use this branch for implementing the capability to reliably regenerate the sqlite database (i.e., I will add it to my long todo list and eventually get it hopefully :smile: ).

fedorov commented 6 years ago

Thanks - that helped a lot

fedorov commented 6 years ago

@notmatthancock lacking the capability to re-generate the database - any chance you could re-generate it yourself to account for the most recent changes in the XML content?

notmatthancock commented 6 years ago

@fedorov issues 1-5 were known and accounted for at the time of populating the database. Issue 6 was also already taken into account in populate.py as you can see in these lines of code despite the fact that I wrote that code before the issue was reported. Retrospectively, I should have reported my observation that lead to my writing that piece of workaround-code to the LIDC maintainers at the time. This leaves issue 7, the incorrect SOP id, as the only inconsistency in the pylidc database (at least with respect to those enumerated on the TCIA page).

The populate.py needs a big refactor before I believe it can be trusted to regenerate the database from scratch while still playing nice with the corresponding sqlalchemy classes. This isn't too difficult, but I'm lacking the time to invest in this currently. In the mean-time, we can add a "band-aid" fix by updating the sqlite database directly to account for the SOP id change, rather than regenerating the entire sqlite database. Unfortunately, these "hot-fix" updates are precisely the reason why a refactor is necessary :weary: but at least it will allow the database to be consistent until such an effort can be made. I'll look into it.

notmatthancock commented 6 years ago

I've looked again at issue 7. This issue doesn't affect the pylidc database since the image sop instance uid is not currently recorded in it.

As far as I can tell, the current pylidc database is up to date with the known issues. Of course, the ability to regenerate the database from scratch is still desired.

fedorov commented 6 years ago

@notmatthancock thank you for those details! Since the database is up to date, I agree this definitely makes the issue non-urgent.

notmatthancock commented 11 months ago

Closing as this is quite old now