ome / bioformats

Bio-Formats is a Java library for reading and writing data in life sciences image file formats. It is developed by the Open Microscopy Environment. Bio-Formats is released under the GNU General Public License (GPL); commercial licenses are available from Glencoe Software.
https://www.openmicroscopy.org/bio-formats
GNU General Public License v2.0
381 stars 241 forks source link

NDPIS images with missing wavelengths fail to open #3282

Closed petebankhead closed 3 years ago

petebankhead commented 5 years ago

Some NDPIS images are failing to open, seemingly due to the wavelength value required by NDPISReader being missing. There error shown when attempting to open the image with Fiji is

java.lang.NullPointerException: Length: Length cannot be constructed with a null value.
 at ome.units.quantity.Length.<init>(Length.java:59)
 at loci.formats.in.NDPISReader.initFile(NDPISReader.java:215)

There is more information at https://github.com/qupath/qupath-bioformats-extension/issues/9 and https://groups.google.com/d/msg/qupath-users/AOWezK8-MKE/Go_h9MthBAAJ (with a sample file).

sbesson commented 5 years ago

Thanks @petebankhead, looking at the code, the issue is related to this code block

https://github.com/openmicroscopy/bioformats/blob/9fc607f85b8900be786813296f1eee75cc1ed883/components/formats-gpl/src/loci/formats/in/NDPISReader.java#L212-L215

It looks like what is required is either a check for null values of wavelength or using the FormatTools.getEmissionWavelength API.

The sample file provided with the original user report is definitely valuable for reproducing the issue and adding to our non-regression tests. Can we add the dataset to our public sample files under CC-BY licenses?

petebankhead commented 5 years ago

Excellent, thanks Sebastien. Regarding the file, it's not mine and so I'm afraid I can't answer. I'll link to this discussion in Google Groups in case the owner wants to comment.

JohnKPan commented 5 years ago

Hi @sbesson I provided the file to @petebankhead.

Great tools, from both of you. I'm glad that the files were valuable.

Unfortunately, I asked and was told by my bosses that it's a hard no to adding the file to a public dataset.

They've actually asked me to request that you delete the files once the bug is fixed.

sorry,

John

sbesson commented 5 years ago

Hi @JohnKPan, no worries at all. We perfectly understand all datasets cannot be shared. Unfortunately the dataset posted on the original thread is no longer available (as per https://github.com/openmicroscopy/bioformats/issues/3282#issuecomment-443307639) and we do need some representative sample for validating the fix.

Would you have other sample files exhibiting the isssue that we could keep in our private QA repository i.e. only available to the Bio-Formats team for non-regression testing and the maintenance of Bio-Formats?

Svidro commented 5 years ago

I tried using the built in submission for BioFormats data sets to help here, but got the "SWFUpload is taking a long time to load or the load has failed. Please make sure that the Flash Plugin is enabled and that a working version of the Adobe Flash Player is installed." despite Flash being installed and active. Our ITS is somewhat overzealous and might be responsible for blocking this, so instead I am making the file available at https://drive.google.com/drive/folders/1zO9PwB5PHFHDdXkNV4rWbjEXAbChZ5oq?usp=sharing If this does not work, please let me know and I can try something else, or maybe from some other location. Feel free to add to the public sample files if you can access it.

sbesson commented 5 years ago

Many thanks @Svidro, I had to rename all files according to the name prefixed specified in the ndpis file. After this, this fileset should be sufficient to resurrect this and hopefully open a straightforward fix. With regard to usage, can we go as far as making these files publicly available under CC-BY license alongside others at https://downloads.openmicroscopy.org/images/Hamamatsu-NDPI/ or would you rather want us to keep them private for non-regression testing only?

Svidro commented 5 years ago

Oops, I had not realized the files contained the original name in the metadata. I was in a bit of a rush to get this out on Friday. If you can edit the metadata to change the name to the one it was provided as, then you could share it publicly. Otherwise I can probably get you another test sample as I have time and you can use this one for your private testing. The name on the file was neither the researcher taking it nor a patient, but I would prefer not to have it named after a person at all :)

Svidro commented 5 years ago

Sorry this took me so long to get back to, but I really only quickly checked that the images opened, and they did! However, two of the channels do not open correctly. If you still have the test files, could you verify that all four channels look similar in the NDPI and NDPIS format? Cy5 NDPIS left, NDPI right. image DAPI NDPIS left, NDPI right image It feels like the NDPIS file is pulling the "wrong" channel of the RGB image... which may be a result of the wavelength information being missing (I am guessing that is what it would use to choose the R, G or B). I am not sure if there is a resolution for this, but I wanted to bring it up, just in case!

The second picture doesn't look as bad as it really is until you zoom in :) All of the pixel values are between 0 and 4, while on the left they are frequently in the mid 70s.

sbesson commented 5 years ago

@Svidro yes, we still have the test images your uploaded (kept private as discussed above). I took a look at it and was able to reproduce what you mentioned. Previously, the reader expectation was that each NDPI had a channel wavelength tag and the wavelength was used to select R, G or B. #3304 fixed the issue when dealing with opening image sets where at least one NDPI has no channel wavelength metadata.

With Bio-Formats 6.0.1, I can reproduce https://github.com/openmicroscopy/bioformats/issues/3282#issuecomment-477698112 with your sample fileset and confirm that the DAPI and Fitc channels are loading up the incorrect channel components (the first channel component in both cases).

The following code was implemented to select the channel component of each NDPI image associated with each channel of the NDPIS fileset based on the channel metadata:

https://github.com/openmicroscopy/bioformats/blob/f47f5e3a611a76d52548b9e22b696498fcf0e1b1/components/formats-gpl/src/loci/formats/in/NDPISReader.java#L220-L230

As you suggested, in the absence of wavelength metadata, this logic is currently returning the first component (R). To solve this issue properly, we might need to find a more authoritative way to identify the correct band for each constituent. Ideas on where such metadata might be encoded or alternate tags we could use to improve the robustness of this logic welcome.

Svidro commented 5 years ago

I wish I could help more there. Come Monday I will be back to bugging my boss to let me look at the Nanozoomer to see if I can fix the problem on that end. And while, I guess, you could easily fix the logic for my one instrument, I have no idea how common the problem is with those two channels only... or if the save order changes on other NDPIS files, or acquisition order changes...

On Fri, Mar 29, 2019, 9:17 AM Sébastien Besson notifications@github.com wrote:

@Svidro https://github.com/Svidro yes, we still have the test images your uploaded (kept private as discussed above). I took a look at it and was able to reproduce what you mentioned. Previously, the reader expectation was that each NDPI had a channel wavelength tag and the wavelength was used to select R, G or B. #3304 https://github.com/openmicroscopy/bioformats/pull/3304 fixed the issue when dealing with opening image sets where at least one NDPI has no channel wavelength metadata.

With Bio-Formats 6.0.1, I can reproduce #3282 (comment) https://github.com/openmicroscopy/bioformats/issues/3282#issuecomment-477698112 with your sample fileset and confirm that the DAPI and Fitc channels are loading up the incorrect channel components (the first channel in both cases).

The following code was implemented to select the channel component of each NDPI image associated with each channel of the NDPIS fileset based on the channel metadata:

https://github.com/openmicroscopy/bioformats/blob/f47f5e3a611a76d52548b9e22b696498fcf0e1b1/components/formats-gpl/src/loci/formats/in/NDPISReader.java#L220-L230

As you suggested, in the absence of wavelength metadata, this logic is currently returning the first component (R). To solve this issue properly, we might need to find a more authoritative way to identify the correct band for each constituent. Ideas on where such metadata might be encoded or alternate tags we could use to improve the robustness of this logic welcome.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openmicroscopy/bioformats/issues/3282#issuecomment-478058962, or mute the thread https://github.com/notifications/unsubscribe-auth/AWEq-RHfZ0T2wCdl9_FWu6in9Qkg-pgzks5vbjyMgaJpZM4Y5idA .

Svidro commented 5 years ago

Well, it turned out to be very easy to fix, and I'll post this here in case anyone else has the same problem. Icon The calibration program contains within it a setting for the wavelengths. It was there all along, I just never got a chance to check for it :) program If you run the program, this is what it would look like, and where the information can be edited. Doesn't really help for images that have already been taken, but we should be fine going forward.

JohnKPan commented 5 years ago

@sbesson Unfortunately, i'm still restricted on sharing my files. I confirm the same issue as @Svidro.

But I went and checked 4 sets of files from 4 different experiments and 2 investigators (Same scanner though). In both 3 channel and 2 channel experiments, the meta data header tags (I just opened the NDPI file in notepad++)

;ShadingAveR_2048=0 ;ShadingAveG_2048=0 ;ShadingAveB_2048=126 ;ShadingAveR_4096=0 ;ShadingAveG_4096=0 ;ShadingAveB_4096=129 ;TransmittanceR=- ;TransmittanceG=- ;TransmittanceB=10%

Always correspond with the channel they show in NDPView (Hamamatsu's viewer). In this case, a DAPI file is blue.

;ShadingAveR_2048=6 ;ShadingAveG_2048=0 ;ShadingAveB_2048=0 ;ShadingAveR_4096=6 ;ShadingAveG_4096=0 ;ShadingAveB_4096=0 ;TransmittanceR=1% ;TransmittanceG=- ;TransmittanceB=-

Here is Red TRITC

and ;ShadingAveR_2048=0 ;ShadingAveG_2048=102 ;ShadingAveB_2048=0 ;ShadingAveR_4096=0 ;ShadingAveG_4096=105 ;ShadingAveB_4096=0 ;TransmittanceR=- ;TransmittanceG=10% ;TransmittanceB=-

Green FITC.

However, i'm not sure what this meta data is actually saying. I can manually change the values in notepad++ here and change them to G or R but it doesn't seem to make any difference when I open it in NDPView. (Except, that some values don't open at all). It still shows up as blue. FITC as green and TRITC as red no matter what values I substitute in. NDPView must be using some other information.

So although I can say the transmittanceR/G/B tag correlates perfectly with the channel NDPView shows*, I don't know what the tags are doing, as they do not appear to affect NDPView in anyway that I can notice other than failing to open.

*Note: CubeName = triple for all these files. As I never actually use the nanozoomer, i'm unsure what this refers to.

@Svidro Can you please check your files and see if your files also correlate the channel with these tags?

If @Svidro's files match mine. We will at least have some confidence that this tag correlates. I wouldn't call it authoritative though.

May I suggest that,

First try and use wavelength information, if that is not there than check transmittance tag than assign channel.

//pseudocode

else if (Wavelength is Null  && TransmittanceR is Not Null && TransmittanceG is Null && 
TransmittanceB is NULL)  bandUsed[c] = 0; 

else if (Wavelength is Null  && TransmittanceR is Null && TransmittanceG is not Null && 
TransmittanceB is not NULL)  bandUsed[c] = 1; 

else if (Wavelength is Null  && TransmittanceR is Null && TransmittanceG is Null && 
TransmittanceB is not NULL)  bandUsed[c] = 2; 

// I think it's important to be explicit that only one transmittance tag is not null. 
Svidro commented 5 years ago

Nice! I had never tried opening the NDPI file in Notepad, but now I can see the entry that I would need to change to get the current fix to work:


;NDP Shading Data
;Version=0004
;ID=1
;Name=Dapi
;CubeName=Triple
;CubeIndex=1
;ExcitationName=Dapi
;ExcitationIndex=1
;EmissionName=Dapi
;EmissionIndex=1
;ExcitationWaveLength=
;EmissionWaveLength=450

I can also confirm that the shading averages are highest in the correct RGB channel to pull from. And transmittance as well.

JohnKPan commented 5 years ago

Nice! I had never tried opening the NDPI file in Notepad, but now I can see the entry that I would need to change to get the current fix to work:

Did that work @Svidro ? I tried plugging in a random number into EmissionWaveLength on mine and I got some error.

I can also confirm that the shading averages are highest in the correct RGB channel to pull from. And transmittance as well.

Wait, does this mean you have non-zero values in the off-channels for the ShadingAve tags?

Svidro commented 5 years ago

The off channels have values of zero, sorry, was just quickly getting that message off and wasn't terribly precise. I am pretty sure if I tried saving the number in notepad that I would be changing more than that one value. I am not sure what the proper method would be to change the header on the NDPI file without affecting the rest of the data in the file (encoding version or whatever). It was just neat to find the location.

sbesson commented 5 years ago

@Svidro @JohnKPan thanks to both of you for driving the investigation. My only minor concern is that this is happening on a closed issue. Thoughts about either moving the discussion to a separate new issue (i.e. consider the initial bug was fixed and this is a follow-up bug) or should we re-open this issue?

In all cases, the shading metadata seems to be a great starting point to select the channel component in the absence of wavelength. I have not reviewed all our NDPI samples but maybe it should always take precedence if such metadata always exists. Note that at the moment Bio-Formats completely ignores this part of the metadata as it is not stored as =-separated key value pairs like other metadata.

https://github.com/openmicroscopy/bioformats/blob/f47f5e3a611a76d52548b9e22b696498fcf0e1b1/components/formats-gpl/src/loci/formats/in/NDPIReader.java#L439-L457

Parsing this NDP Shading Data and adding it to the global metadata would be a goof first step only for NDPIS image sets but also individual NDPI files.

JohnKPan commented 5 years ago

This seems like it's part of the same issue to me. People are too lazy to manually put in the wavelength information, and therefore if Bioformats is to work reliably, it should find a way to conform to the reality that people are lazy. But, I don't know how the OME organization works, what would get attention the quickest?

Please note: This shading information only seems to appear in florescence files. My Brightfield files do not contain that section at all.

Svidro commented 5 years ago

It would be a bit difficult to predict the many ways people might break things until someone actually breaks them and reports it. I suspect the majority of Hamamatsu systems are set correctly and never have this problem, as we should not in the future. Too many file types and too many systems... hard to expect anyone to have it all figured out ahead of time. Next time someone might post that they deleted their channel weighting somehow!

On Thu, Apr 4, 2019, 2:28 PM JohnKPan notifications@github.com wrote:

This seems like it's part of the same issue to me. People are too lazy to manually put in the wavelength information, and therefore if Bioformats is to work reliably, it should find a way to conform to the reality that people are lazy. But, I don't know how the OME organization works, what would get attention the quickest?

Please note: This shading information only seems to appear in florescence files. My Brightfield files do not contain that section at all.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openmicroscopy/bioformats/issues/3282#issuecomment-480070495, or mute the thread https://github.com/notifications/unsubscribe-auth/AWEq-YpaGTSI7Qlt_uiZ_FMsfhDumo0zks5vdm54gaJpZM4Y5idA .

dgault commented 3 years ago

Closing this issue as the linked PR providing a fix has been tested and merged. This fix will be available in the upcoming Bio-Formats 6.6.1 patch release .