imi-bigpicture / wsidicom

Python package for reading DICOM WSI file sets.
Apache License 2.0
33 stars 5 forks source link
annotations dicom digital-pathology whole-slide-image

wsidicom

wsidicom is a Python package for reading DICOM WSI. The aims with the project are:

Installing wsidicom

wsidicom is available on PyPI:

pip install wsidicom

And through conda:

conda install -c conda-forge wsidicom

Important note

Please note that this is an early release and the API is not frozen yet. Function names and functionality is prone to change.

Requirements

wsidicom uses pydicom, numpy, Pillow, marshmallow, fsspec, universal-pathlib, and dicomweb-client. Imagecodecs, pylibjpeg-rle, pyjpegls, and pylibjpeg-openjpeg can be installed as optionals to support additional transfer syntaxes.

Limitations

Basic usage

Load a WSI dataset from files in folder.

from wsidicom import WsiDicom
slide = WsiDicom.open("path_to_folder")

The files argument accepts either a path to a folder with DICOM WSI-files or a sequence of paths to DICOM WSI-files.

Load a WSI dataset from remote url using fsspec.

from wsidicom import WsiDicom
slide = WsiDicom.open("s3://bucket/key", file_options={"s3": "anon": True})

Or load a WSI dataset from opened streams.

from wsidicom import WsiDicom

slide = WsiDicom.open_streams([file_stream_1, file_stream_2, ... ])

Or load a WSI dataset from a DICOMDIR.

from wsidicom import WsiDicom

slide = WsiDicom.open_dicomdir("path_to_dicom_dir")

Or load a WSI dataset from DICOMWeb.

from wsidicom import WsiDicom, WsiDicomWebClient
from requests.auth import HTTPBasicAuth

auth = HTTPBasicAuth('username', 'password')
client = WsiDicomWebClient.create_client(
    'dicom_web_hostname',
    '/qido',
    '/wado,
    auth
)
slide = WsiDicom.open_web(
    client,
    "study uid to open",
    "series uid to open" or ["series uid 1 to open", "series uid 2 to open"]
)

Alternatively, if you have already created an instance of dicomweb_client.DICOMwebClient, that may be used to create the WsiDicomWebClient like so:

dicomweb_client = DICOMwebClient("url")
client = WsiDicomWebClient(dicomweb_client)

Then proceed to call WsiDicom.open_web() with this as in the first example.

Use as a context manager.

from wsidicom import WsiDicom
with WsiDicom.open("path_to_folder") as slide:
    ...

Read a 200x200 px region starting from px 1000, 1000 at level 6.

region = slide.read_region((1000, 1000), 6, (200, 200))

Read a 2000x2000 px region starting from px 1000, 1000 at level 4 using 4 threads.

region = slide.read_region((1000, 1000), 6, (200, 200), threads=4)

Read 3x3 mm region starting at 0, 0 mm at level 6.

region_mm = slide.read_region_mm((0, 0), 6, (3, 3))

Read 3x3 mm region starting at 0, 0 mm with pixel spacing 0.01 mm/px.

region_mpp = slide.read_region_mpp((0, 0), 0.01, (3, 3))

Read a thumbnail of the whole slide with maximum dimensions 200x200 px.

thumbnail = slide.read_thumbnail((200, 200))

Read an overview image (if available).

overview = slide.read_overview()

Read a label image (if available).

label = slide.read_label()

Read (decoded) tile from position 1, 1 in level 6.

tile = slide.read_tile(6, (1, 1))

Read (encoded) tile from position 1, 1 in level 6.

tile_bytes = slide.read_encoded_tile(6, (1, 1))

Close files

slide.close()

API differences between WsiDicom and OpenSlide

The WsiDicom API is similar to OpenSlide, but with some important differences:

Conversion between OpenSlide location and level parameters to WsiDicom can be performed:

with WsiDicom.open("path_to_folder") as wsi:
    level = wsi.levels[openslide_level_index]
    x = openslide_x // 2**(level.level)
    y = openslide_y // 2**(level.level)

Metadata

WsiDicom parses the DICOM metadata in the opened image into easy-to-use dataclasses, see wsidicom\metadata.

with WsiDicom.open("path_to_folder") as wsi:
    metadata = wsi.metadata

The obtained WsiMetadata has child dataclass properties the resembelse the DICOM WSI modules (compare with the VL Whole Slide Microscopy Image CIOD):

Note that not all DICOM attributes are represented in the defined metadata model. Instead the full ´pydicom´ Datasets can be accessed per level, for example:

with WsiDicom.open("path_to_folder") as wsi:
    wsi.levels.base_level.datasets[0]

If you encounter that some important and/or useful attribute is missing from the model, please make an issue (see Contributing).

Slide information

The Slide information model models the Specimen module has the following properties:

Note that that while the parsing of slide information is designed to be as flexible and permissive as possible, some datasets contains non-standard compliant Specimen modules that are (at least currently) not possible to parse. In such cases the stainings and samples property will be set to None. If you have a dataset with a Specimen module that you think should be parsable, please make an issue (see Contributing).

SlideSample

Each sample is model with the SlideSample dataclass, which represents an item in the DICOM Specimen Description Sequence

Samplings

The optional sampled_from property can either be a Sampling or a UnknownSampling. Both of these specify a sampled specimen, with the difference that the UnknownSampling is used when the sampling conditions are not fully know. A Sampling is more detailed, and specifies the sampling method and optional properties such as sampling date_time, description and location.

Specimens

The specimen property of a Sampling or a UnknownSampling links to either a Specimen or a Sample. A Specimen has no known parents (e.g. could be the specimen extracted from a patient), while a Sample always is produced from one or more samplings of other Specimens or Samples. The samplings used to produce a Sample is given by its sampled_from-property. Both Specimen and Sample contain additional properties describing the specimen:

Processing and staining steps

The processing steps that can be performed on a sample are:

The Staining(s) for a Slide contains a list of substances used for staining. The substances used should defined in CID 8112.

Every processing step (including staining) also have the optional properties date_time for when the processing was done and description for a textual description of the processing.

These steps are parsed from the SpecimenPreparationSequence following TID 8004 for each specimen identifier in the item sequence.

Exporting to json

The metadata can be exported to json:

from wsidicom.metadata.schema.json import WsiMetadataJsonSchema

with WsiDicom.open("path_to_folder") as wsi:
    metadata = wsi.metadata

schema = WsiMetadataJsonSchema()
metadata_json = schema.dump(metadata)

Settings

The strictness of parsing of DICOM WSI metadata can be configured using the following settings (see Settings):

Saving files

An opened WsiDicom instance can be saved to a new path using the save()-method. The produced files will be:

By default frames are copied as-is, i.e. without re-compression.

with WsiDicom.open("path_to_folder") as slide:
    slide.save("path_to_output")

The output folder must already exists. Be careful to specify a unique folder folder to avoid mixing files from different images.

Optionally frames can be transcoded, either by a encoder setting or an encoder:

from wsidicom.codec import JpegSettings

with WsiDicom.open("path_to_folder") as slide:
    slide.save("path_to_output", transcoding=JpegSettings())

Settings

wsidicom can be configured with the settings variable. For example, set the parsing of files to strict:

from wsidicom import settings
settings.strict_uid_check = True
settings.strict_attribute_check = True

Annotation usage

Annotations are structured in a hierarchy:

Codes that are defined in the 222-draft can be created using the create(source, type) function of the ConceptCode-class.

Load a WSI dataset from files in folder.

from wsidicom import WsiDicom
slide = WsiDicom.open("path_to_folder")

Create a point annotation at x=10.0, y=20.0 mm.

from wsidicom import Annotation, Point
point_annotation = Annotation(Point(10.0, 20.0))

Create a point annotation with a measurement.

from wsidicom import ConceptCode, Measurement
# A measurement is defined by a type code ('Area'), a value (25.0) and a unit code ('Pixels).
area = ConceptCode.measurement('Area')
pixels = ConceptCode.unit('Pixels')
measurement = Measurement(area, 25.0, pixels)
point_annotation_with_measurment = Annotation(Point(10.0, 20.0), [measurement])

Create a group of the annotations.

from wsidicom import PointAnnotationGroup
# The 222 supplement requires groups to have a label, a category and a type
group = PointAnnotationGroup(
    annotations=[point_annotation, point_annotation_with_measurment],
    label='group label',
    categorycode=ConceptCode.category('Tissue'),
    typecode=ConceptCode.type('Nucleus'),
    description='description'
)

Create a collection of annotation groups.

from wsidicom import AnnotationInstance
annotations = AnnotationInstance([group], 'volume', slide.uids)

Save the collection to file.

annotations.save('path_to_dicom_dir/annotation.dcm')

Reopen the slide and access the annotation instance.

slide = WsiDicom.open("path_to_folder")
annotations = slide.annotations

Setup environment for development

Requires poetry installed in the virtual environment.

git clone https://github.com/imi-bigpicture/wsidicom.git
poetry install

To watch unit tests use:

poetry run pytest-watch -- -m unittest

The integration tests uses test images from nema.org that's needs to be downloaded. The location of the test images can be changed from the default tests\testdata\slides using the environment variable WSIDICOM_TESTDIR. Download the images using the supplied script:

python .\tests\download_test_images.py

If the files are already downloaded the script will validate the checksums.

To run integration tests:

poetry run pytest -m integration

Data structure

A WSI DICOM pyramid is in wsidicom represented by a hierarchy of objects of different classes, starting from bottom:

Labels and overviews are structured similarly to levels, but with somewhat different properties and restrictions. For DICOMWeb the WsiDicomFile* classes are replaced with WsiDicomWeb* classes.

A Source is used to create WsiInstances, either from files (WsiDicomFileSource) or DICOMWeb (WsiDicomWebSource), and can be used to to Initiate a WsiDicom object. A source is easiest created with the open() and open_web() helper functions, e.g.:

slide = WsiDicom.open("path_to_folder")

Code structure

Adding support for other file formats

Support for other formats (or methods to access DICOM data) can be implemented by creating a new Source implementation, that should create WsiInstances for the implemented formats. A format specific implementations of the ImageData is likely needed to access the WSI image data. Additionally a WsiDataset needs to be created that returns matching metadata for the WSI.

The implemented Source can then create a instance from the implemented ImageData (and a method returning a WsiDataset):

image_data = MyImageData('path_to_image_file')
dataset = create_dataset_from_image_data(image_data)
instance = WsiInstance(dataset, image_data)

The source should arrange the created instances and return them at the level_instances, label_instances, and overview_instances properties. WsiDicom can then open the source object and arrange the instances into levels etc as described in 'Data structure'.

Other DICOM python tools

Contributing

We welcome any contributions to help improve this tool for the WSI DICOM community!

We recommend first creating an issue before creating potential contributions to check that the contribution is in line with the goals of the project. To submit your contribution, please issue a pull request on the imi-bigpicture/wsidicom repository with your changes for review.

Our aim is to provide constructive and positive code reviews for all submissions. The project relies on gradual typing and roughly follows PEP8. However, we are not dogmatic. Most important is that the code is easy to read and understand.

Acknowledgement

wsidicom: Copyright 2021 Sectra AB, licensed under Apache 2.0.

This project is part of a project that has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 945358. This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA. IMI website: