sgibb / MALDIquantForeign

Import/Export routines for MALDIquant
https://strimmerlab.github.io/software/maldiquant/
10 stars 4 forks source link

Warning when exporting imzML and faulty imzML #20

Closed SvenSondhauss closed 7 years ago

SvenSondhauss commented 7 years ago

I want to export some IMS data as imzML file with the exportImzMl function. During the process I get the following warning: Warning message: In .ibdOffsets(x, processed = processed, encodedLengthSize = encodedLengthSize) : integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))' The resulting imzML file has "NA" value for most of the "external offset" entries. Additionally, even with processed=FALSE the external offset value in mzArray is always a different value, while it should be the, if I understood it correctly. The data set, that I want to convert, is relatively large (7953 spectra, 36713 data points each, around 3 GB object size).

sgibb commented 7 years ago

@SvenSondhauss thanks for reporting this. It could be that you dataset is too large to fit in an integer vector. Could you please try to fix the function .ibdOffsets in your R environment:

library("MALDIquantForeign")

fixInNamespace(".ibdOffsets", "MALDIquantForeign")

and replace the code with the following:

.ibdOffsets <- function(x, processed=TRUE, encodedLengthSize=8L) {
  ## start at 16 (16 bytes for UUID)
  n <- rep(unlist(lapply(x, length)), each=2L)
  encodedLength <- as.double(n * encodedLengthSize)

  if (processed) {
    offsets <- cumsum(as.double(c(16L, encodedLength[-length(n)])))
  } else {
    sel <- seq(from=2L, to=length(n), by=2L)
    offsets <- rep.int(16L, length(n))
    offsets[sel] <- 16L + cumsum(as.double(encodedLength[sel]))
  }

  matrix(c(offsets, n, encodedLength), nrow=length(n),
         dimnames=list(rep(c("mass", "intensity"), times=length(x)),
                       c("offset", "length", "encodedLength")))
}

Afterwards please run you export method as usual.

SvenSondhauss commented 7 years ago

@sgibb Thanks you for your quick reply. Replacing the code with yours solved the problem with the "NA" values. To test the file I tried to re-import it in R with importImzMl and got the following error:

Reading spectrum from ‘D:\R WD\idata2.imzML’ ... Found mzML document (version: 1.1). Error in .attributeToString(attributes = attributes, attributeName = attributeName, : Malformed mzXML: attribute ‘value’ is missing!

I'm not sure to which value attribute it is referring to. Edit: I found the missing value entries:

<cvParam cvRef="IMS" accession="IMS:1000044" name="max dimension x" unitCvRef="UO" unitAccession="UO:0000017" unitName="micrometer"/>
<cvParam cvRef="IMS" accession="IMS:1000045" name="max dimension y" unitCvRef="UO" unitAccession="UO:0000017" unitName="micrometer"/>
<cvParam cvRef="IMS" accession="IMS:1000046" name="pixel size x" unitCvRef="UO" unitAccession="UO:0000017" unitName="micrometer"/>
<cvParam cvRef="IMS" accession="IMS:1000047" name="pixel size y" unitCvRef="UO" unitAccession="UO:0000017" unitName="micrometer"/>

Changing them to the following let's me re-import the file without a problem.

<cvParam cvRef="IMS" accession="IMS:1000044" name="max dimension x" value="" unitCvRef="UO" unitAccession="UO:0000017" unitName="micrometer"/>
<cvParam cvRef="IMS" accession="IMS:1000045" name="max dimension y" value="" unitCvRef="UO" unitAccession="UO:0000017" unitName="micrometer"/>
<cvParam cvRef="IMS" accession="IMS:1000046" name="pixel size x" value="100" unitCvRef="UO" unitAccession="UO:0000017" unitName="micrometer"/>
<cvParam cvRef="IMS" accession="IMS:1000047" name="pixel size y" value="100" unitCvRef="UO" unitAccession="UO:0000017" unitName="micrometer"/>

I tried a smaller data file and checked the old function. It happens with that as well, even though I used pixelSize=100 in the export function.

sgibb commented 7 years ago

@SvenSondhauss thanks for investigating this. Unfortunately I won't have time to look into it before next week. But could you send me a small example file that you import, export and reimport with MALDIquantForeign for testing?

SvenSondhauss commented 7 years ago

@sgibb Sure thing. I send you a link. Thanks for looking into it.

sgibb commented 7 years ago

@SvenSondhauss thanks for the file and your investigation you already did. You were right. The problem is that your file contains no information about the pixelSize. In fact it just includes the coordinates. That's why exportImzMl writes empty values into the imzMl file. But if the coordinates are given by the user exportImzMl calculates dimension etc. on its own (but not if the coordinates are part of the object).

You could use the following as workaround:

exportImzMl(spectra, file="myspectra.imzML", coordinates=coordinates(spectra), pixelSize=c(100, 100))
## exportImzMl treats the coordinates as given by the user and calculates dim, size, etc.

I going to fix that in the next days and will upload a new version on CRAN.

SvenSondhauss commented 7 years ago

@sgibb Thank you very for your help!

sgibb commented 7 years ago

The problem is fixed in MALDIquant 0.11 (I just uploaded it to CRAN, should be available soon).