DICOM compressed image ( JPEG Extended (Process 2 and 4)) problems - Githubissues

pydicom / pylibjpeg

A Python framework for decoding JPEG images, with a focus on supporting pydicom

MIT License

46 stars 5 forks source link

DICOM compressed image ( JPEG Extended (Process 2 and 4)) problems #73

Closed jedimasterkim closed 1 year ago

jedimasterkim commented 1 year ago

dicom_file_path = "1.2.840.10008.1.2.4.51.dcm" # JPEG Extended (Process 2 and 4)
ds = pydicom.dcmread(dicom_file_path)
arr=ds.pixel_array # 
plt.imshow(arr, cmap=plt.cm.bone)
plt.title("Decompressed DICOM Image")
plt.show()

I would like to display DICOM image like avove many other compressed transfersyntax file is doing well I installed most of required lib. the error mesage is like bellow

File "C:\Users\jedim\AppData\Local\Programs\Python\Python311\Lib\site-packages\libjpeg\utils.py", line 116, in decode
    raise RuntimeError(
RuntimeError: libjpeg error code '-1038' returned from Decode(): A misplaced marker segment was found - scan start must be zero and scan stop must be 63 for the sequential operating modes

help me ......

scaramallion commented 1 year ago

This seems like the jpeg image is faulty. Are you able to share an anonymised version of the dataset?

jedimasterkim commented 1 year ago

1.2.840.10008.1.2.4.90_anon.zip heare is the file ..

I didn't install pylibjpeg-rle....

scaramallion commented 1 year ago

Transfer Syntax UID is 1.2.840.10008.1.2.4.51 - JPEG Extended (Process 2 and 4)

(0028,0002) US 1                                        #   2, 1 SamplesPerPixel
(0028,0004) CS [MONOCHROME2]                            #  12, 1 PhotometricInterpretation
(0028,0008) IS [1]                                      #   2, 1 NumberOfFrames
(0028,0010) US 1024                                     #   2, 1 Rows
(0028,0011) US 1024                                     #   2, 1 Columns
(0028,0100) US 16                                       #   2, 1 BitsAllocated
(0028,0101) US 10                                       #   2, 1 BitsStored
(0028,0102) US 9                                        #   2, 1 HighBit
(0028,0103) US 0                                        #   2, 1 PixelRepresentation

The JPEG meta info is:

=================== SOI marker at offset 0 ====================

------------- SOF1 marker at offset 2, length 13 --------------
Extended sequential DCT, Huffman coding
Sample size (px): 1024 x 1024
Sample precision (bits): 12
Number of component images: 1
  Component ID: 1
    Horizontal sampling factor: 1
    Vertical sampling factor: 1
    Quantization table destination: 0

------------- DQT marker at offset 15, length 69 --------------
Table destination ID: 0
Table precision: 0 (8-bit)
Quantization table:
  19  13  12  19  28  48  61  73
  14  14  16  22  31  69  72  66
  16  15  19  28  48  68  82  67
  16  20  26  34  61  104  96  74
  21  26  44  67  81  130  123  92
  28  42  66  76  97  124  135  110
  58  76  93  104  123  145  144  121
  86  110  114  117  134  120  123  118

------------- DHT marker at offset 84, length 105 -------------
Lossless/DC Huffman, table ID: 0
   1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16
  00 03 01 01 01 01 01 01 00 00 00 00 00 00 00 00 : # codes
  00 01 02                                        : L = 2
  03                                              : L = 3
  04                                              : L = 4
  06                                              : L = 5
  05                                              : L = 6
  07                                              : L = 7
  0a                                              : L = 8
AC Huffman, table ID: 0
   1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16
  00 02 02 02 01 03 02 04 03 05 05 04 06 08 04 07 : # codes
  00 01                                           : L = 2
  02 11                                           : L = 3
  03 21                                           : L = 4
  31                                              : L = 5
  04 12 41                                        : L = 6
  05 51                                           : L = 7
  06 13 22 61                                     : L = 8
  32 71 81                                        : L = 9
  07 14 42 91 a1                                  : L = 10
  23 b1 b2 c1 d1                                  : L = 11
  15 52 92 f0                                     : L = 12
  25 33 43 62 72 73                               : L = 13
  26 53 63 82 a2 c2 e1 f1                         : L = 14
  24 44 54 d2                                     : L = 15
  35 45 64 74 94 a3 d3                            : L = 16

------------- SOS marker at offset 189, length 10 -------------
Number of image components: 1
  Component: 1, DC table: 0, AC table: 0
Spectral selectors start-end: 0-0
Successive approximation bit high-low: 0-0

................... ENC marker at offset 199 ...................

42512 bytes of entropy-coded data

Yeah, for sequential DCT the spectral selectors start-end must be 0-63 (they're 0-0 here). The JPEG is invalid.

scaramallion commented 1 year ago

It looks like you can successfully force the decode by editing the JPEG stream:

from pydicom import dcmread
from pydicom.encaps import generate_pixel_data, encapsulate

import matplotlib.pyplot as plt

ds = dcmread("pylj_73.dcm")
data = next(generate_pixel_data(ds.PixelData, ds.NumberOfFrames))[0]
# The invalid value is at offset 197
data = bytearray(data)
data[197] = 63
ds.PixelData = encapsulate([data])
arr = ds.pixel_array
plt.imshow(arr)
plt.show()

scaramallion commented 1 year ago

Note to self: it would be nice to add the ability to override parameter values for decode.

jedimasterkim commented 1 year ago

thanks for your help. can I generalized the code like this for the most of JPEG Extended (Process 2 and 4) files ?

point=ds.file_meta.FileMetaInformationGroupLength data[point+1] = 63

but actually I don't know what the meaning of 63... Ha ha ..

scaramallion commented 1 year ago

from io import BytesIO
from typing import Any

from pydicom import dcmread, FileDataset
from pydicom.encaps import encapsulate, generate_pixel_data_frame
from pydicom.uid import JPEGExtended12Bit

import matplotlib.pyplot as plt

from pylibjpeg.tools.s10918.io import parse

def repair_jpeg(ds: FileDataset) -> FileDataset:
    """Check the dataset `ds` for an invalid JPEG Spectral Selection End
    value and repair it.
    """
    if ds.file_meta.TransferSyntaxUID != JPEGExtended12Bit:
        return ds

    frames = []
    for frame in generate_pixel_data_frame(ds.PixelData, ds.NumberOfFrames):
        data = BytesIO(frame)

        # Parse the JPEG stream to get the marker segments and parameters
        info = parse(data)

        # Find the SOS marker
        for (marker, offset) in info:
            if marker != "SOS":
                continue

            # The value of spectral selector end must be 63 for SOF1
            sos = info[(marker, offset)][2]
            needs_fixing = sos["Se"] == 0
            if needs_fixing:
                # `offset` is to first byte of the 2 byte marker
                # `sub_offset` is from first byte of 2 byte marker to Se value
                # SOS: (1) + 1, Ls: 2, Ns: 1, Csj, Tdj, Taj: 2 * Ns, Ss: 1, Se: 1
                se_offset = 1 + 2 + 1 + 2 * sos["Ns"] + 2
                total_offset = offset + se_offset
                print(f"Repairing invalid Se value at offset {total_offset}")

        data = bytearray(data.getvalue())
        # Repair the Se value
        if needs_fixing:
            data[total_offset] = 63
        frames.append(data)

    ds.PixelData = encapsulate(frames)

    return ds

if __name__ == "__main__":
    ds = dcmread("073.dcm")
    ds = repair_jpeg(ds)

    arr = ds.pixel_array
    plt.imshow(arr)
    plt.show()