kuwisdelu / Cardinal

Mass spectrometry imaging toolbox
http://www.cardinalmsi.org
Artistic License 2.0
42 stars 14 forks source link

Some question about Cardinal package #2

Closed YonghuiDong closed 6 years ago

YonghuiDong commented 6 years ago

Hi @kuwisdelu,

Thanks for this fantastic package.

I have some questions when learning Cardinal package:

(1) The feature question

It seems to me that the MS peaks from each pixel are aligned according to a pre-defined features during importing. Am I correct?

If my assumption is right, the adjacent MS peaks will be forced to align to the same peak. This can be problematic for high-mass-resolution MS imaging data.

For instance, the simulated MSImageSet data contains 3 peaks at m/z 50, 50.4 and 50.5 respectively. But all the 3 peaks are aligned to m/z 49.5.

pattern <- factor(c(0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 2, 2, 0,
                      0, 0,0,0,0,0,1,2,2,0,0,0,0,0,2,1,1, 2,
                      2,0,0,0,0,0,1,2,2,2,2,0,0,0,0,1,2, 2,
                      2,2,2,0,0,0,0,2,2,2,2,2,2,2,0,0,0, 2,
                      2,0,0,0,0,0,0,2,2,0,0,0,0,0),
                    levels=c(0,1,2), labels=c("blue", "black", "red"))

msset <- generateImage(pattern, coord=expand.grid(x=1:9, y=1:9),
                      range=c(0, 100), centers=c(50, 50.4, 50.5),
                      resolution=100, step=3.3, as="MSImageSet")

plot(msset, pixel=1)

1

Only one peak is shown in the mass spectrum. As you can see in the following figure, when I plot ion images of the three peaks, they are identical.

image(msset, mz=50)
image(msset, mz=50.4)
image(msset, mz=50.5)

2

(2) The resolution parameter

When I try to increase resolution parameter, the number of features did not change.

msset10 <- generateImage(pattern, coord=expand.grid(x=1:9, y=1:9),
                      range=c(0, 100), centers=c(50, 50.4, 50.5),
                      resolution=10, step=3.3, as="MSImageSet")

msset100 <- generateImage(pattern, coord=expand.grid(x=1:9, y=1:9),
                      range=c(0, 100), centers=c(50, 50.4, 50.5),
                      resolution=100, step=3.3, as="MSImageSet")
msset100 <- generateImage(pattern, coord=expand.grid(x=1:9, y=1:9),
                      range=c(0, 100), centers=c(50, 50.4, 50.5),
                      resolution=1000, step=3.3, as="MSImageSet")
length(features(msset10))
length(features(msset100))
length(features(msset1000))

Maybe the m/z intervals of the feature could be narrowed down and more features are defined with the increase of resolution, then adjacent MS peaks could be separated by assigning to different features?

(3) Read imzML problem The above problem is so far only observed in simulated data. When I try to test my "real dataset", I was unable to load my imzMl data using either readImzML() or readMSIData() function. It gives me the following error message:

Error in readImzML("data/example") : 
  REAL() can only be applied to a 'numeric', not a 'list'

I could open my dataset with MALDIquantForeign and commercial software SciLS lab. I have tested all the example files from imzML website, and there is no problem loading them with Cardinal.

So I guess maybe my dataset is not 'standard' as it was converted with the instrument vendor software. My smallest dataset is 700 MB, so I add it in dropbox, and put the link here. Could please have a look when you have time?

https://www.dropbox.com/sh/18ns6px4c9e2fir/AACyRElaIR1PlFHS0LgRXglKa?dl=0

Thanks a lot.

Dong

kuwisdelu commented 6 years ago

Hi Dong,

Because it doesn't directly concern the internal programming aspects of the package, this question would be better suited to the Cardinal help group:

https://groups.google.com/forum/?hl=en#!forum/CardinalMSI

To answer your questions:

(1) No processing has been applied to the simulated data, so the m/z are not aligned to anything. The m/z are generated based on the range and step attribute. The generated dataset will have m/z range based on range, measured at m/z intervals based on step. The situation you describe is as if you have three real peaks, but m/z are only measured at 49.5 +/ 3.3, so naturally, you only see as single peak, because the three cannot be resolved.

Features are not aligned to anything when read into Cardinal as raw data, only when you call peakAlign() during preprocessing. For 'continuous' data, the m/z will be exactly as they are in the file. For 'processed' data, they will be binned according the the 'mass.accuracy' parameter.

(2) The resolution parameter does not control the number of features or the intervals at which they're measured. It controls, essentially, the simulated resolving power. I.e., the "sharpness" or separateness of peaks. You would also need increase the number of features by choosing a smaller step size.

I should note that the simulated data functions are meant primarily for quick examples and are not very realistic when compared to the behavior of real data and real instruments.

(3) There appears to be an error in the converter you are using. The imzML file indicates a <spectrumList> count of 8023, but contains only 7636 <spectrum> children. Cardinal expected 8023 spectra, so it ran into an error. I assume the other software do not read this particular metadata, and simply read until they reach the end of the <spectrumList>.

A fix for now would be to edit the imzML file and change the line:

<spectrumList count="8023" defaultDataProcessingRef="XcaliburProcessing">

to

<spectrumList count="7636" defaultDataProcessingRef="XcaliburProcessing">

This allowed me to read in the data correctly.

YonghuiDong commented 6 years ago

@kuwisdelu Thanks a lot for the explanation.