lgatto / synapter

Label-free data analysis pipeline for optimal identification and quantitation
https://lgatto.github.io/synapter/
4 stars 2 forks source link

converting an object from Synapter to MSnSet #106

Closed sgibb closed 8 years ago

sgibb commented 8 years ago

Currently (just in synapter 2.0) as adds the sample name to sampleNames and fvarLables of the new MSnSet automatically, e.g.:

x <- as(readRDS(file), "MSnSet")
fvarLabels(x)
#  [1] "peptide.seq.S130423_05" "protein.Accession.S130423_05"
# ...

This is nice if you want to combine the different MSnSets.

Unfortunately it complicates the modification of each individual MSnSet. While x <- topN(x, groupBy = fData(x)$protein.Accession, n = 3) is working (because of autocompletion), nPeps <- nQuants(x, fcol = "protein.Accession") fails.

I am wondering what would be the best way to avoid this. We could remove the addition to fvarLabels in as and call updateFvarLabels manually. Another solution would be to add pmatch to the fcol argument of nQuants.

Does anybody have a good idea?

pavel-shliaha commented 8 years ago

As far as I understand your question is the following: In

https://github.com/lgatto/synapter/issues/99

functionality was added when MSnbase now kindly adds prefixes to the names of MSnSets when combining them. I am very happy with this functionality and would prefer it not to be changed.

I am not sure what is the problem, is it that FvarLabel "protein.Accession" changes during combination? If so this should have been the case even when we called "updateFvarLabels" manually. Why did the problem manifest itself now when "updateFvarLabels" has become automatic?

sgibb commented 8 years ago

You are right the functionality was introduced in #99. The problem is that the MSnSet has colum names as "protein.Accession.SAMPLENAME" which make them hard to handle before combination (I was not aware of this problem before, because I never used topN or nQuant on the MSnSets before combining them.). E.g. if you want to call nQuants in a loop you have to grep the column that contains "protein.Accession" because the name changed from protein.Accession to protein.Accession.SAMPLENAME (that is ugly and error-prone). I partly revert the changes from #99. Now the sampleName is set automatically but the column names are not changed. Current workflow in synapter 2.0 is:

sets <- lapply(files, function(file) {
    x <- as(readRDS(file), "MSnSet")
    ## do something with the MSnSet
    x <- updateFvarLabels(x, sampleNames(x)[1])
    x
}
combined <- Reduce(combine, sets)

which is simpler (at least one line shorter) than:

msna <- as(a, "MSnSet")
sampleNames(msna) <- "BC_F24_CW"
msna <- updateFvarLabels(msna, "BC_F24_CW")

I would prefer that combine automatically adds the sampleNames to the columns. @lgatto: Would this be possible?

As far as I can see combine tries to merge columns and rows. So it would not be easy to predict whether the featureNames should be added to the rows or the sampleNames to the columns. Do you have an idea how to solve this in MSnbase (where combine is defined) or synapter?

sgibb commented 8 years ago

After https://github.com/lgatto/MSnbase/pull/72 was merged we can use nQuants(x, groupBy = fData(x)$protein.Accession). The $ operator supports auto-completion that means we could reintroduce #99. While it would simplify the synapter workflow for multiple runs using a fragment library it would be more difficult to work with a single synapter run (because the user has to know that the columns are COLUMNNAME.SAMPLENAME). IMHO forcing the user who works with multiple synapter runs to call x <- updateFvarLabels(x, sampleNames(x)[1]) is simple enough and we keep the current solution.

pavel-shliaha commented 8 years ago

I agree as a user I would prefer 99 to be reintroduced.

sgibb commented 8 years ago

99 is again part of synapter 2.0