imi-bigpicture / wsidicomizer

Python library for converting WSI files to DICOM
Apache License 2.0
52 stars 6 forks source link

Converted DICOM slide is cropped #56

Closed Radegos closed 1 year ago

Radegos commented 1 year ago

Hello, first of all I would like to thank you for this tool. I believe I stumbled upon undesirable behaviour of conversion. In a nutshell, the converted DICOM slide is cropped.

I tried converting MIRAX slide and the resulting dimensions of original MIRAX and DICOM slide differ. After conversion I used openslide to read MIRAX slide and the property dimensions returns slide size of (101832, 219976). When I opened the DICOM slide with library wsidicom, the property size returns slide size of (94960, 129790). The sizes of other levels also differ.

I tested it on slide available at openslide, slide Mirax2.2-3.zip, but this behaviour also appears on our fresh, private slides.

We have the following problem with this behaviour: We have xml file with points of interests, where each point is defined by (pixel) coordinates. The coordinates are based on original MIRAX slide. After converting to DICOM, the coordinates become invalid, as the DICOM slide is cropped.

erikogabrielsson commented 1 year ago

Hi Radegos and thanks for creating the issue.

For MIRAX (and all formats read with openslide) we use the bounds-properties to only convert the part of the image that has image data. Therefore it appears that the DICOM image is cropped.

Two solution that are (in my opinion) not so good is to either create blank tiles for the out-of-bounds part of the slide, or use sparse tiling. The first would result longer conversion and larger files, the second in files that are slower to open.

DICOM files can store the offset from the slide corner to the image origin in the Total Pixel Matrix Origin Sequence Attribute. With this (and the image resolution) you should be able to get the annotations to map to the correct position.

There is basic support for inserting this offset in the converted files, but the offset in not yet automatically taken from for example the openslide bounds. I can work on adding support for this.

Does this seem reasonable?

erikogabrielsson commented 1 year ago

Hi again, Would it be possible for you to provide an annotation in your format to some openly available mirax image, for example the openslide test image CMU-1.mrxs ?

erikogabrielsson commented 1 year ago

Ping @Radegos

Radegos commented 1 year ago

Hello,

I see, believe that using Total Pixel Matrix Origin Sequence Attribute for storing the offset could work, thank you. As of the annotations, I annotated the CMU-1.mrxs using ASAP tool,

<?xml version="1.0"?>
<ASAP_Annotations>
    <Annotations>
        <Annotation Name="Annotation 0" Type="Polygon" PartOfGroup="None" Color="#F4FA58">
            <Coordinates>
                <Coordinate Order="0" X="59816" Y="151127" />
                <Coordinate Order="1" X="60163" Y="151549" />
                <Coordinate Order="2" X="60510" Y="151920" />
                <Coordinate Order="3" X="60931" Y="152206" />
                <Coordinate Order="4" X="61402" Y="152379" />
                <Coordinate Order="5" X="61873" Y="152218" />
                <Coordinate Order="6" X="62307" Y="151970" />
                <Coordinate Order="7" X="62568" Y="151536" />
                <Coordinate Order="8" X="62530" Y="151040" />
                <Coordinate Order="9" X="62357" Y="150569" />
                <Coordinate Order="10" X="61948" Y="150247" />
                <Coordinate Order="11" X="61452" Y="150098" />
                <Coordinate Order="12" X="60956" Y="150111" />
                <Coordinate Order="13" X="60473" Y="150260" />
                <Coordinate Order="14" X="60076" Y="150569" />
            </Coordinates>
        </Annotation>
    </Annotations>
    <AnnotationGroups />
</ASAP_Annotations>
erikogabrielsson commented 1 year ago

Thanks. Can you also provide a snapshot of the annotation so that I can confirm that the coordinate translation works?

erikogabrielsson commented 1 year ago

Never mind, I realized I could just load your annotation in ASAP my self :)

Using this code I can get the area of the annotation:

import os
from pathlib import Path
from wsidicomizer import WsiDicomizer
from wsidicom import WsiDicom, Point, Polygon
from wsidicom.geometry import RegionMm, PointMm, SizeMm
from wsidicom.graphical_annotations import Geometry
from typing import Dict, Any, List
import xmltodict

slide_path = r'C:\slides\other_formats\openslide-testdata\mirax\1\CMU-1.mrxs'
converted_path = r'c:\temp\mrxs_annotation'
os.makedirs(converted_path, exist_ok=True)

# Convert to DICOM
with WsiDicomizer.open(slide_path) as wsi:
    wsi.save(converted_path)

# Method from wsidicom test to convert ASAP to Geometry. Note that polygon is in float
def asap_to_geometries(dictionary: Dict[str, Any]) -> List[Geometry]:
    annotation_type: str = dictionary["@Type"]
    coordinate_dict = dictionary['Coordinates']['Coordinate']
    if(annotation_type == 'Dot'):
        return [Point.from_dict(coordinate_dict, "@X", "@Y")]
    elif(annotation_type == 'PointSet'):
        points = Point.multiple_from_dict(coordinate_dict, "@X", "@Y")
        return points  # type: ignore
    elif(annotation_type == 'Polygon'):
        return [Polygon.from_dict(coordinate_dict, "@X", "@Y")]

    raise NotImplementedError(annotation_type)

# Open the annotation file and parse to geometry
annotation_file = Path(converted_path).joinpath('annotation.xml')
with open(annotation_file) as f:
    annotation_xml = xmltodict.parse(f.read())["ASAP_Annotations"]
    geometries = asap_to_geometries(annotation_xml["Annotations"]["Annotation"])
polygon = geometries[0]

# Open converted image and translate the polygon
with WsiDicom.open(converted_path) as wsi:
    # This is a bit messy as the geometry is incorrectly in float
    region_relative_to_slide = RegionMm(
        PointMm(
            polygon.box.start.x * wsi.base_level.pixel_spacing.width,
            polygon.box.start.y * wsi.base_level.pixel_spacing.height
        ),
        SizeMm(
            polygon.box.size.width * wsi.base_level.pixel_spacing.width,
            polygon.box.size.height * wsi.base_level.pixel_spacing.height,
        )
    )
    region_relative_to_image = region_relative_to_slide - wsi.base_level.image_origin.origin 
    image = wsi.read_region_mm(region_relative_to_image.start.to_tuple(), 3, region_relative_to_image.size.to_tuple())
image

Resulting image: image

From ASAP: image

I have pushed the changes to the repos and will merge during the week.

Radegos commented 1 year ago

Hello, this looks good, thank you for implementing it.

erikogabrielsson commented 1 year ago

Great! Thanks again for reporting the error.