jamesdolezal / slideflow

Deep learning library for digital pathology, with both Tensorflow and PyTorch support.
https://slideflow.dev
GNU General Public License v3.0
234 stars 39 forks source link

[BUG] .mrxs slides missing MPP (openslide.mpp-x) error #323

Closed Meijian closed 9 months ago

Meijian commented 9 months ago

Description

Hi James, I was running issues when extracting tiles from .mrxs slides using extract_tiles() function. About half of the images complained about "missing MPP (openslide.mpp-x)". It seems like some images don't have mpp-x in their properties. Is it possible to fix this bug by using something else? I used libvips as the backend.

To Reproduce

Steps to reproduce the behavior:

  1. %sh export SF_BACKEND=torch export SF_SLIDE_BACKEND=libvips export SF_ALLOW_ZIP=0
  2. import slideflow as sf
  3. P = sf.load_project(rootpath) dataset=P.dataset(tile_px=299, tile_um='10x') dataset.extract_tiles(qc='both', save_tiles=False, save_tfrecords=True, whitespace_fraction=0.6, normalizer='reinhard_fast')

dataset.extract_tiles(qc='both', save_tiles=False, save_tfrecords=True, whitespace_fraction=0.6, normalizer='reinhard_fast') [19:38:30] INFO Slide reading backend: libvips
INFO Extracting tiles using reinhard_fast normalization
INFO Filtering tiles by whitespace fraction
INFO Filtering tiles by grayspace fraction
INFO Working on dataset source MyProject...
[19:39:43] INFO Extracting tiles from 178 slides (tile_px=299, tileum=30x)
INFO Using 16 processes (pool=spawn)
ERROR Error loading slide HE
##########.mrxs: Slide HE_########## missing MPP (openslide.mpp-x). Skipping
Speed: ? Extracting... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 178/178 ● 0:00:00 INFO Generating PDF (this may take some time)...
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/slideflow/slide/report.py:240: UserWarning: Substituting font arial by core font helvetica self.set_font('Arial', size=9) /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/slideflow/slide/report.py:248: UserWarning: Substituting font arial by core font helvetica self.set_font('Arial', 'B', 16) /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/slideflow/slide/report.py:257: UserWarning: Substituting font arial by core font helvetica self.set_font('Arial', '', 10) /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/slideflow/slide/report.py:342: UserWarning: Substituting font arial by core font helvetica pdf.set_font('Arial', style='B') /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/slideflow/slide/report.py:347: UserWarning: Substituting font arial by core font helvetica pdf.set_font('Arial') /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/slideflow/slide/report.py:361: UserWarning: Substituting font arial by core font helvetica pdf.set_font('Arial', style='B', size=10) /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/slideflow/slide/report.py:367: UserWarning: Substituting font arial by core font helvetica pdf.set_font('Arial') /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/slideflow/slide/report.py:376: UserWarning: Substituting font arial by core font helvetica pdf.set_font('Arial', 'B', 7) /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/slideflow/slide/report.py:268: UserWarning: Substituting font arial by core font helvetica self.set_font('Arial', 'I', 8) Skipping CSV update; no extraction reports found. {}

Expected behavior

Environment:

Additional context

jamesdolezal commented 9 months ago

Hi Meijian - the "MPP not found" errors are often symptoms of another underlying issue, such as corrupt/incomplete slides. For example, with MRXS files, a format in which the primary .mrxs file is accompanied by a similarly-named directory containing .dat files, a common issue is that some of the necessary .dat files are missing/corrupt causing the image to be unable to load. In some cases, these slide loading errors are reported as "MPP not found", even though the true error is a corrupt/incomplete file.

The first step in debugging these issues is to try opening the file manually in libvips and checking for errors:

import pyvips
from pprint import pprint

vips_img = pyvips.Image.new_from_file('/path/to/file.mrxs')
pprint(vips_img.get_fields())

You should see a bunch of detected fields in that file.

If you get an error, then there is something wrong with the slide file. If that's the case, I would recommend either retransferring the file from wherever it came from (if feasible), and/or checking a file checksum/hash (if you have a comparison available).

If you don't get an error, then please share the output of the above code as it may provide some insights into what went wrong.

As an aside - I'm planning another documentation expansion over the next month or so, and I will include a section on slide debugging to help others who may encounter similar issues.

Meijian commented 9 months ago

Hi James, thanks for your suggestions. I tried to use your code and had an error message like the following. I think this might be the underlying problem with this image.

error

My colleague found a fix that involved changing one line of code in openslide. I will try it to see if it works. Is it something you guys could potentially help to change in slideflow? Multiple colleagues of mine ran into this same problem. https://github.com/openslide/openslide/pull/333/files/203152de12dc95164a52222f5378c3e3d3ea1d59

jamesdolezal commented 9 months ago

Glad to hear that the issue was found and fixed! It looks like the fix is now available in OpenSlide 4.0.0.

As this was a bug in a dependency, it's not something we can directly fix in Slideflow. The solution would be to upgrade OpenSlide in your environment.

Although there's not really a way to provide this fix for people using Slideflow from PyPI or running from source, we may be able to include the newest version of OpenSlide in our published docker containers, which would provide this fix for those using Docker. I'll look into upgrading our docker images with this newest version of openslide.

Meijian commented 9 months ago

Hi James, thanks. Yeah, upgrading to 4.0.0 might be the best solution. It'd be great if you guys could catch up to the new version in the docker image too.