imi-bigpicture / wsidicom

Python package for reading DICOM WSI file sets.
Apache License 2.0
33 stars 5 forks source link

Add support for deciding DICOMweb transfer syntax #120

Closed psavery closed 10 months ago

psavery commented 11 months ago

For our DICOMweb example that we have been using, each series only supports one transfer syntax: the syntax that the pixel data is encoded in. We can obtain this information by requesting for the field AvailableTransferSyntaxUID in a query to the DICOMweb server.

Perhaps in wsidicom.open_web(), we should allow None to be passed as the requested_transfer_syntax. In that case, we perform a query to the DICOMweb server to see which transfer syntaxes that series supports. If multiple transfer syntaxes are supported, we would prefer the syntaxes that we can more easily work with (such as syntaxes that Pillow can read).

What do you think?

erikogabrielsson commented 11 months ago

Sounds good. If I understand the AvailableTransferSyntaxUID-attribute correctly:

As the stored transfer syntax could differ within a series, is this something that needs to be checked per instance?

erikogabrielsson commented 11 months ago

How would this be different from sending multiple acceptable transfer syntaxes (possibly with preference value) in the wado fetch?

psavery commented 11 months ago

I think we just need to make sure we know what transfer syntax these frames are in so we can decode them properly. I think that the transfer syntax can't always be determined from the frame itself (for example, uncompressed data).

One way to know the transfer syntax would be to just request a specific transfer syntax. Maybe there's another way, I'm not sure.

psavery commented 11 months ago

By the way, I made a dict of most media types in case you need it:

MEDIA_TYPES = {
    uid.JPEGBaseline8Bit: "image/jpeg",
    uid.JPEGExtended12Bit: "image/jpeg",
    uid.JPEGLosslessP14: "image/jpeg",
    uid.JPEGLosslessSV1: "image/jpeg",
    uid.JPEGLSLossless: "image/jls",
    uid.JPEGLSNearLossless: "image/jls",
    uid.JPEG2000: "image/jp2",
    uid.JPEG2000Lossless: "image/jp2",
    uid.JPEG2000MC: "image/jpx",
    uid.JPEG2000MCLossless: "image/jpx",
    uid.RLELossless: "image/dicom-rle",
    uid.ExplicitVRLittleEndian: "application/octet-stream",
}

I also have a function that decides the transfer syntax, although you might want to implement it a different way.

erikogabrielsson commented 10 months ago

Hi @psavery Apparently the Available Transfer Syntax UID attribute is optional in a QIDO instance response:

The response may optionally include:

the Available Transfer Syntax UID (0008,3002) to describe the Transfer Syntaxes that the origin server can assure will be supported for retrieval of the SOP Instance. See [Section C.6.1.1.5.2 in PS3.4 ](https://dicom.nema.org/medical/dicom/current/output/html/part04.html#sect_C.6.1.1.5.2).

We can of course use it if the server returns it.

One alternative approach is to supply a list of acceptable transfer syntaxes. This is however problematic as there is no easy way to figure out what will be revived. Although it is reasonable easy to just test different decoders to figure out which one that works, there are some functionality in wsidicom also requiring to encode frames in the same transfer syntax.

A second alternative is to just do frame request for a list of transfer syntaxes that is supported and select the best out of those that the server responds to.

psavery commented 10 months ago

One alternative approach is to supply a list of acceptable transfer syntaxes. This is however problematic as there is no easy way to figure out what will be revived. Although it is reasonable easy to just test different decoders to figure out which one that works, there are some functionality in wsidicom also requiring to encode frames in the same transfer syntax.

Yeah, and another problem is that I don't think we have a way to decode certain formats (such as uncompressed data) without knowing its transfer syntax. The use of pydicom's pixel data handlers in #119 would require us to know the data's transfer syntax before we decode it. So in some cases, we must know the transfer syntax to decode the data.

A second alternative is to just do frame request for a list of transfer syntaxes that is supported and select the best out of those that the server responds to.

Yeah, we could do a frame request, one at a time, with different transfer syntaxes, and see which ones the server actually provides.

All of these things would only happen if the user provided None as the requested transfer syntax, which means that the user wants us to figure out the transfer syntax ourselves. Maybe we could follow these steps:

  1. Try to get the AvailableTransferSyntaxUID of the series if the user provides it, and then select one of those to use.
  2. Try requesting one frame at a time of different formats to see which one works, then use that transfer syntax.
  3. If none of those things work, raise an exception saying that the transfer syntax could not be automatically determined.

What do you think?

erikogabrielsson commented 10 months ago

Fixed by #126

psavery commented 10 months ago

So this is working! But it slows down our search a lot (when we look at series on a server, we try to see which ones we can open with wsidicom). For the slim server example here, it takes 3.5 minutes now to try to open every series, whereas it took about 30 seconds before.

Any chance we could speed up the automatic detection? I think trying to see if the server provides AvailableTransferSyntaxUID is a possible option, before trying to retrieve a frame in every format.

psavery commented 10 months ago

Also, there's an internal server error (on the DICOMweb server) for one of the frame checks if you run this example:

from wsidicom import WsiDicom, WsiDicomWebClient

url = 'https://idc-external-006.uc.r.appspot.com/dcm4chee-arc/aets/DCM4CHEE/rs'
study_uid = '2.25.272603497800249433889091116769002955881'
series_uid = '1.3.6.1.4.1.5962.99.1.2279867487.1875562763.1642957407327.2.0'

client = WsiDicomWebClient.create_client(url)

slide = WsiDicom.open_web(client, study_uid, series_uid)

That one should be JPEG-LS.

erikogabrielsson commented 10 months ago

Also, there's an internal server error (on the DICOMweb server) for one of the frame checks if you run this example:

from wsidicom import WsiDicom, WsiDicomWebClient

url = 'https://idc-external-006.uc.r.appspot.com/dcm4chee-arc/aets/DCM4CHEE/rs'
study_uid = '2.25.272603497800249433889091116769002955881'
series_uid = '1.3.6.1.4.1.5962.99.1.2279867487.1875562763.1642957407327.2.0'

client = WsiDicomWebClient.create_client(url)

slide = WsiDicom.open_web(client, study_uid, series_uid)

That one should be JPEG-LS.

This is a problem of the server. For some reason it does not accept image/jls, 1.2.840.10008.1.2.4.80 as media type for some instances:

metadata = dicomweb_client.search_for_instances(
    "2.25.272603497800249433889091116769002955881",
    "1.3.6.1.4.1.5962.99.1.2279867487.1875562763.1642957407327.2.0",
    fields=["AvailableTransferSyntaxUID"],
    search_filters={'SOPInstanceUID': '1.2.826.0.1.3680043.9.7433.3.15439468476012145490280683714649027'}
)[0]
available_transfer_syntaxes = metadata["00083002"]['Value'][0]
print("AVAILABLE_SOP_TRANSFER_SYNTAX_UID:", available_transfer_syntaxes)
dicomweb_client.retrieve_instance_frames(
    "2.25.272603497800249433889091116769002955881",
      "1.3.6.1.4.1.5962.99.1.2279867487.1875562763.1642957407327.2.0",
      "1.2.826.0.1.3680043.9.7433.3.15439468476012145490280683714649027",
      [1],
    (('image/jls', available_transfer_syntaxes),)
)

Throws with:

HTTPError: 406 Client Error: Not Acceptable for url: https://idc-external-006.uc.r.appspot.com/dcm4chee-arc/aets/DCM4CHEE/rs/studies/2.25.272603497800249433889091116769002955881/series/1.3.6.1.4.1.5962.99.1.2279867487.1875562763.1642957407327.2.0/instances/1.2.826.0.1.3680043.9.7433.3.15439468476012145490280683714649027/frames/1
erikogabrielsson commented 10 months ago

So this is working! But it slows down our search a lot (when we look at series on a server, we try to see which ones we can open with wsidicom). For the slim server example here, it takes 3.5 minutes now to try to open every series, whereas it took about 30 seconds before.

Any chance we could speed up the automatic detection? I think trying to see if the server provides AvailableTransferSyntaxUID is a possible option, before trying to retrieve a frame in every format.

I have added using the available transfer syntaxes if returned by the server here. Loading the slim wsi with multiple series now takes < 10 s for me.

erikogabrielsson commented 10 months ago

Closed by #133