lgatto / MSnbase

Base Classes and Functions for Mass Spectrometry and Proteomics
http://lgatto.github.io/MSnbase/
125 stars 50 forks source link

reordering spectra in MSnExp #68

Closed pavel-shliaha closed 8 years ago

pavel-shliaha commented 8 years ago

currently when a spectra is being loaded from mzml the spectra are not ordered according toretention time. E.g.

spectra <- readMSData ("H2A_EThcD_1e6_Reagent.mzML") barplot (rtime (spectra))

rt_problem

I tried to reorder the spectra but this does not seem to help:

spectra <- spectra[order (as.double (rtime (spectra)))]

Any suggestions?

lgatto commented 8 years ago

This is because the spectra a stored as a unordered collection (in an environment). The names reflect the order they are read in at the low level, upon object construction. What I would suggest is to first store the desired order

o <- order(rtime(spectra))

and then use o for downstream ordering, for example

barplot(rtime(spectra)[o])

I can think of ways to implement this so that the order is recorded. Could you describe your downstream use case a bit more.

pavel-shliaha commented 8 years ago

Thanks for the quick reply. I am currently trying top-down and for this I am doing systematic assesment of different fragmentation conditions. For this I am performing direct infusion of a protein (no chromatography). I have 5 main parameters for fragmentation (amount of analyte, amount of ETDreagent, reaction time for ETD, type of supplemental activation, collision energy for supplemental activation), these 5 parameters create thousands of combinations which are tested one by one. In the end I get a file in which there are thousands of fragmentation spectra with different fragmentation conditions.

When I analyse results I need order of the spectra preserved, since order also corresponds to fragmentation conditions. Moreover I would prefer if I could:

1) combine different MSnExp 2) remove spectra I dont want

basically it would be great if I could work with MSnExps as with lists. I can of course work around it (as I do now), but I would prefer to run cleaner code.

lgatto commented 8 years ago

You are referring to MSnSet above, but that should be MSnExp - could you clarify.

Regarding your second point

2) remove spectra I dont want

file <- dir(system.file(package="MSnbase",dir="extdata"),
                full.name=TRUE,
                pattern="mzXML$")
aa <- readMSData(file)
aa2 <- aa[1:3]
> featureNames(aa)
[1] "X1.1" "X2.1" "X3.1" "X4.1" "X5.1"
> featureNames(aa2)
[1] "X1.1" "X2.1" "X3.1"
pavel-shliaha commented 8 years ago

Sorry MSnExp of course

lgatto commented 8 years ago

There is not combine method for MSnExp. I shouldn’t be too difficult to add, I think. The problem with MSnExp objects is that they tend to be big, is this not an issue for you?

pavel-shliaha commented 8 years ago

No not really. I work with thousands of spectra. it would perhaps be a problem if I worked with 10s of throusands. But I would really prefer of course if MSnExps would behave as lists (I understand it might be complicated to programme, but no harm in asking).

lgatto commented 8 years ago

I will write a combine method for MSnExp instances.

Will think about the ordering issue.

sgibb commented 8 years ago

The problem of "disorder" arise from the numerical sorting of ASCII characters, e.g.

e <- new.env()
assign("X1", 1, envir=e)
assign("X2", 2, envir=e)
assign("X10", 10, envir=e)
ls(e)
# [1] "X1"  "X10" "X2" 

If we prepend a "zero" to lower numbers the problem should be fixed:

e <- new.env()
assign(sprintf("X%02.0f", 1), 1, envir=e)
assign(sprintf("X%02.0f", 2), 1, envir=e)
assign(sprintf("X%02.0f", 10), 1, envir=e)
ls(e)
# [1] "X01" "X02" "X10"

I would suggest the following solution #69 (this should fix at least the barplot above)

sgibb commented 8 years ago

@lgatto If you don't have enough time for implementing the combine I could do this. I assume it would be very similar to combine,MSnSet,MSnSet-method.

lgatto commented 8 years ago

Closing this now - the 2 feature requests are recorded in issues #71 and #70.