Slideflow 2.2 further extends multiple-instance learning (MIL) capabilities, with the introduction of multi-magnification MIL, new models, experimental uncertainty quantification, and various other enhancements. This release also includes two new pretrained feature extractors - HistoSSL and PLIP - as well as support for the self-supervised learning framework, DINOv2. Slideflow Studio has been updated with several new features and quality of life improvements. Finally, the documentation has been enriched with Developer Notes and new tutorials, providing deeper insights on select topics.
Table of Contents
Multi-Magnification MIL
New Feature Extractors
a. Pretrained
b. DINOv2
New MIL Features
Slideflow Studio Updates
Documentation Expansion
Other New Features
Version Requirements
Multi-Magnification MIL
Slideflow now supports multi-modal MIL, with feature bags generated from multiple feature extractors at different magnifications. Multi-magnification MIL offers potential advantages if there are valuable histologic features at both low and high magnification.
Working with multi-magnification MIL is easy - you can use the same training API as standard MIL models. Simply provide multiple bag paths (one for each magnification) and use the new "mm_attention_mil" model.
# Configure a multimodal MIL model.
config = mil_config('mm_attention_mil', lr=1e-4)
# Set the bags paths for each modality.
bags_10x = '/path/to/bags_10x'
bags_40x = '/path/to/bags_40x'
P.train_mil(
config=config,
outcomes='HPV_status',
train_dataset=train,
val_dataset=val,
bags=[bags_10x, bags_40x]
)
Slideflow Studio also supports multi-magnification MIL models, allowing you to visualize attention and tile-level predictions from each mode separately.
New Feature Extractors
We've introduced support for two new pretrained feature extractors, as well as the self-supervised learning framework DINOv2.
Pretrained
The new pretrained feature extractors include:
HistoSSL: a pan-cancer, pretrained ViT-based iBOT model (iBOT[ViT-B]PanCancer). Paper
PLIP: feature encoder used for a CLIP model finetuned on pathology images and text descriptions. Paper
Licenses and citations are available for all feature extractors through the new .license and .citation attributes.
>>> ctranspath = sf.model.build_feature_extractor('ctranspath', tile_px=299)
>>> ctranspath.license
'GNU General Public License v3.0'
>>> print(ctranspath.citation)
@{wang2022,
title={Transformer-based Unsupervised Contrastive Learning for Histopathological Image Classification},
author={Wang, Xiyue and Yang, Sen and Zhang, Jun and Wang, Minghui and Zhang, Jing and Yang, Wei and Huang, Junzhou and Han, Xiao},
journal={Medical Image Analysis},
year={2022},
publisher={Elsevier}
}
DINOv2
As with SimCLR, Slideflow now supports generating features from a trained DINOv2 model. Use the feature extractor 'dinov2', and pass the *.pth teacher weights to the argument weights, and the YAML configuration file to the argument cfg:
We've also provided a modified version of DINOv2 that allows you to train the network using Slideflow projects and datasets. See our documentation for instructions on how to train and use DINOv2.
New MIL Features
Slideflow 2.2 includes a number of updates in MIL functionality, including:
A new fully-transformer model, "bistro.transformer" (from this paper)
New aggregation_level MIL config option. If any patients have multiple slides, use aggregation_level="patient". (#316)
MIL models can be further customized with a new argument mil_config(model_kwargs={...}). Use model_kwargs in the MIL configuration to pass through keyword arguments to the model initializer. This can be used, for example, to specify the softmax temperature and attention gating for the Attention_MIL model through the model keyword arguments temperature and attention_gate.
New experimental uncertainty quantification support for MIL models. Enable uncertainty estimation with the argument uq=True to either P.train_mil or P.evaluate_mil. The model must support a uq argument in it's forward() function. At present, MIL UQ is only available for the Attention_MIL model.
New class initializer DatasetFeatures.from_bags(), for loading a DatasetFeatures objects from previously generated feature bags. This makes it easier to perform latent space exploration and visualization, using DatasetFeatures.map_activations() (see docs)
New sf.mil.MILFeatures class to assist with calculating and visualizing last-layer activations from MIL models, prior to final logits. This class is analogous to the DatasetFeatures interface, but for MIL model layer activations.
Slideflow Studio Updates
The latest version of Slideflow Studio includes a number of usability improvements and new features.
ROI labels: You can now assign labels to Regions of Interest (ROIs) in Studio. These labels can be used for downstream strongly-supervised training, where labels are determined from ROIs rather than inherited from the slide label.
ALT hover: Press Left ALT while hovering over a heatmap to show the raw prediction/attention values beneath your cursor.
Progress bars: A progress bar is now displayed when generating predictions for a slide.
Tile predictions with MIL: Show tile-level predictions and attention for MIL models by right-clicking anywhere on a slide
GAN seed tracking: Keep track of GAN seeds with easy saving and loading. Quickly scroll through seeds by pressing left/right on your keyboard.
Scale sidebar icons with font size
Improved low-memory mode for MIL models (supports devices with < 8 GB RAM)
Preserve ROIs and slide settings when changing models
Default to Otsu's thresholding instead of grayspace filtering, for improved efficiency
Documentation Expansion
Documentation at https://slideflow.dev has been further expanded, and a new Developer Notes section has been added. Developer Notes are intended to provide a deeper dive into selected topics of interest for developers or advanced users. Our first developer notes include:
TFRecords: Reading and Writing: a detailed description of our TFRecord data format, with examples of how to create, inspect, and read from these files.
Dataloaders: Sampling and Augmentation: descriptions of how to create PyTorch DataLoaders or Tensorflow tf.data.Datasets and apply custom image transformations, labeling, and sampling. Includes a detailed examination of our oversampling and undersampling methods.
Custom Feature Extractors: A look at how to construct custom feature extractors for MIL models.
Strong Supervision with Tile Labels: An example of how Region of Interest (ROI) labels can be leveraged for training strongly-supervised models.
Align two slides together using sf.WSI.align_to(). This coarse alignment is fast and effective for slides in the proper orientation, without distortion. Use the more accurate sf.WSI.align_tiles_to() method for a higher accuracy alignment, finetuned at each tile location.
Rotate a whole-slide image upon initial load using the new transforms argument [Libvips only]. This is particularly useful when attempting to align slides:
wsi = sf.WSI(..., transforms=[sf.slide.ROTATE_90_CLOCKWISE])
Use OpenSlide bounding boxes, if present, with the new WSI argument use_bounds [Libvips only]. If True, will use existing OpenSlide bounding boxes. If a tuple, will crop the whole-slide image to the specified bounds. This is particularly useful when aligning slides.
# Use OpenSlide bounds
wsi = sf.WSI(..., use_bounds=True)
# Manually define the bounding box
wsi = sf.WSI(..., use_bounds=(41445, 112000, 48000, 70000))
Highlights
Slideflow 2.2 further extends multiple-instance learning (MIL) capabilities, with the introduction of multi-magnification MIL, new models, experimental uncertainty quantification, and various other enhancements. This release also includes two new pretrained feature extractors - HistoSSL and PLIP - as well as support for the self-supervised learning framework, DINOv2. Slideflow Studio has been updated with several new features and quality of life improvements. Finally, the documentation has been enriched with Developer Notes and new tutorials, providing deeper insights on select topics.
Table of Contents
Multi-Magnification MIL
Slideflow now supports multi-modal MIL, with feature bags generated from multiple feature extractors at different magnifications. Multi-magnification MIL offers potential advantages if there are valuable histologic features at both low and high magnification.
Working with multi-magnification MIL is easy - you can use the same training API as standard MIL models. Simply provide multiple bag paths (one for each magnification) and use the new
"mm_attention_mil"
model.Slideflow Studio also supports multi-magnification MIL models, allowing you to visualize attention and tile-level predictions from each mode separately.
New Feature Extractors
We've introduced support for two new pretrained feature extractors, as well as the self-supervised learning framework DINOv2.
Pretrained
The new pretrained feature extractors include:
iBOT[ViT-B]PanCancer
). PaperLicenses and citations are available for all feature extractors through the new
.license
and.citation
attributes.DINOv2
As with SimCLR, Slideflow now supports generating features from a trained DINOv2 model. Use the feature extractor
'dinov2'
, and pass the*.pth
teacher weights to the argumentweights
, and the YAML configuration file to the argumentcfg
:We've also provided a modified version of DINOv2 that allows you to train the network using Slideflow projects and datasets. See our documentation for instructions on how to train and use DINOv2.
New MIL Features
Slideflow 2.2 includes a number of updates in MIL functionality, including:
A new fully-transformer model,
"bistro.transformer"
(from this paper)New
aggregation_level
MIL config option. If any patients have multiple slides, useaggregation_level="patient"
. (#316)MIL models can be further customized with a new argument
mil_config(model_kwargs={...})
. Usemodel_kwargs
in the MIL configuration to pass through keyword arguments to the model initializer. This can be used, for example, to specify the softmax temperature and attention gating for theAttention_MIL
model through the model keyword argumentstemperature
andattention_gate
.New experimental uncertainty quantification support for MIL models. Enable uncertainty estimation with the argument
uq=True
to eitherP.train_mil
orP.evaluate_mil
. The model must support auq
argument in it'sforward()
function. At present, MIL UQ is only available for theAttention_MIL
model.New class initializer
DatasetFeatures.from_bags()
, for loading aDatasetFeatures
objects from previously generated feature bags. This makes it easier to perform latent space exploration and visualization, usingDatasetFeatures.map_activations()
(see docs)New
sf.mil.MILFeatures
class to assist with calculating and visualizing last-layer activations from MIL models, prior to final logits. This class is analogous to theDatasetFeatures
interface, but for MIL model layer activations.Slideflow Studio Updates
The latest version of Slideflow Studio includes a number of usability improvements and new features.
Documentation Expansion
Documentation at https://slideflow.dev has been further expanded, and a new Developer Notes section has been added. Developer Notes are intended to provide a deeper dive into selected topics of interest for developers or advanced users. Our first developer notes include:
DataLoaders
or Tensorflowtf.data.Datasets
and apply custom image transformations, labeling, and sampling. Includes a detailed examination of our oversampling and undersampling methods.In addition to these new Dev Notes, we've also added two tutorials (Tutorial 7: Training with Custom Augmentations and Tutorial 8: Multiple-Instance Learning), as well as expanded our Slideflow Studio docs to reflect the latest features.
Other New Features
Align two slides together using
sf.WSI.align_to()
. This coarse alignment is fast and effective for slides in the proper orientation, without distortion. Use the more accuratesf.WSI.align_tiles_to()
method for a higher accuracy alignment, finetuned at each tile location.Rotate a whole-slide image upon initial load using the new
transforms
argument [Libvips only]. This is particularly useful when attempting to align slides:Use OpenSlide bounding boxes, if present, with the new WSI argument
use_bounds
[Libvips only]. IfTrue
, will use existing OpenSlide bounding boxes. If a tuple, will crop the whole-slide image to the specified bounds. This is particularly useful when aligning slides.New Indexable PyTorch dataset, for easier integration of Slideflow datasets into external projects
Improved progress bars during tfrecord interleaving
Add support for protobuf version 4
Train GANs at odd image sizes with the new
resize
argument (see docs)Train a GAN conditioned on tile-level labels (see docs)
Version Requirements
Version requirements are largely unchanged. Notable differences include:
PLIP
feature extractor requires thetransformers
package.TransMIL
model requires thenystom_attention
package.