suyashkumar / dicom

⚡High Performance DICOM Medical Image Parser in Go.
MIT License
943 stars 137 forks source link

Improve support for non-compliant DICOM and edge cases #327

Open lnogueir opened 4 months ago

lnogueir commented 4 months ago

I was playing around with the pydicom test files and noticed that parsing (with SkipProcessingPixelDataValue() option enabled) the following files result in errors:

These files were parseable by dcm4che.

For most of them, the issue is related to missing some or all meta tags. Ideally, if the meta tags aren't included, we should ignore and try parsing the first data element. And we can guess the transfer syntax by peeking the first tag and VR of the first element. We try [implicit LE, explicit BE, explicit LE], whichever one decodes a known tag and VR is likely the correct transfer syntax and we should use that for the rest of the dataset.

suyashkumar commented 4 months ago

Thanks @lnogueir! I agree that we can attempt some enhanced fallback behavior, particularly if that's standard practice in other parsers. Our current fallback behavior for no transfer syntax found in the Metadata is to proceed with little endian implicit but it shouldn't be too difficult to try a couple different transfer syntaxes speculatively.

suyashkumar commented 4 months ago

For missing metadata group length we do hav a flag for that, but can consider automatic fallback behavior: https://github.com/suyashkumar/dicom/blob/65259e55c2dd96a811e752aff2864ef16b903351/read.go#L184

I think originally the idea was the safest option is to force callers to be as explicit as possible about what loosened restrictions they may want in case there are any safety issues with fallback behavior or in case the fact the dicom is not compliant is something the user would want raised to them (in case that led to other concerns). That being said, recently I've been thinking it's more reasonable to just have more automatic fallbacks, at least ones that are "standard" in the industry for dicom non-compliance.

suyashkumar commented 4 months ago

Going through these in a little more detail, and adding some more notes:

  1. Easy fix (have working locally)
  2. Easy fix (have working locally)
  3. This appears to parse fine, but iiuc the problem is the PixelData is intentionally truncated? So the vl says 8192 but there aren't that many bytes left in the dicom. We treat this as an error, which seems reasonable imo but we can discuss more.
  4. Bug/implementation needed with UN unknown sequences
  5. Needs more investigation, seems related to the UN sequences / tags possibly similar to 4.
  6. Also seems possible related to UN sequences / tag handling
  7. Same as 4/5/6 at first glance?
  8. Should parse fine with allowMissingMetaElementGroupLength but need to check
suyashkumar commented 4 months ago

The draft changes in #330 address 1 and 2.

lnogueir commented 3 months ago

For missing metadata group length we do hav a flag for that, but can consider automatic fallback behavior:

https://github.com/suyashkumar/dicom/blob/65259e55c2dd96a811e752aff2864ef16b903351/read.go#L184

I think originally the idea was the safest option is to force callers to be as explicit as possible about what loosened restrictions they may want in case there are any safety issues with fallback behavior or in case the fact the dicom is not compliant is something the user would want raised to them (in case that led to other concerns). That being said, recently I've been thinking it's more reasonable to just have more automatic fallbacks, at least ones that are "standard" in the industry for dicom non-compliance.

I agree. I think it's better to have these automatic fallbacks and just log warnings if non-compliant things are found. That's the behavior most libraries choose I believe.

suyashkumar commented 3 months ago

Also, the changes in #331 address 6 (with SkipPixelData). They almost address 4 but some other changes are still needed for that.

suyashkumar commented 3 months ago

Edited the original comment to update current status, link to PRs, etc.

lnogueir commented 2 months ago

I noticed that the following files (also from pydicom) do not error, but drop values:

  1. rtdose_rle.dcm
  2. rtdose_rle_1frame.dcm (seems to be the same issue as the file above)

The issue seems to be that they change the transfer syntax when writing the SQ dataset. The top level dataset is encoded with explicit little endian, but the dataset of the Referenced RT Plan Sequence is encoded with implicit little endian.

These were the pydicom issues that handled that: https://github.com/pydicom/pydicom/issues/1035, https://github.com/pydicom/pydicom/issues/1067