exposing useful functions

pavel-shliaha commented 6 years ago

There are some functions that I would like to use in my everyday work and they are actually implemented, but hidden. I would like to ask you to expose them. The ones I would like straight away are 2:

1) ion to fragment matching with fragments 2) plotting spectra annotation.

Exposing these two will provide a very simple interface for spectra annotation. As an example of why i need this. I have two 50 aa peptides with diffrent PTM position (Lys4 and Lys 24 methylation). I have done ETD on them

capture

The + 14Da shift shown by the box is the result of difference in PTM localisation. Basically I need to annotate the two spectra to see the differences in the fragments, due to different PTM localisation. I would like to see the two spectra on top of each other (their mz must be linked) as in the figure with the annotations smth like "c20Rme" and c20 in the highlighted case

I am not entirely sure what is the most intuitive and effective way of exposing this functionality of doing this. I imagine you work with spectra objects, with ion to fragment matching snd plotting, so maybe if I wanted to annotate a spectrum, I could create a new spectrum object. Please let me know what you think

sgibb commented 6 years ago

The "ion to fragment matching with fragments" function is called .condition2data.frame and is indeed hidden. You could test it via topdownr:::.condition2data.frame(tds[, 1]):

library("topdownr")
tds <- readTopDownFiles("../topdownrdata/inst/extdata/20170629_myo/")

df <- topdownr:::.condition2data.frame(tds[,1])

knitr::kable(df[55:64,])

	mz	intensity	fragment	type
55	3459.714	121038.52		None
56	3629.804	305726.62	y34	C-terminal
57	3662.896	154252.28		None
58	3763.955	198377.98		None
59	3820.976	76868.09		None
60	3878.944	608393.25		None
61	3894.952	27341.83	y36	C-terminal
62	3965.978	271138.03	b36*	C-terminal
63	3981.994	53548.00	y37	C-terminal
64	4055.096	243017.27		None

I could easily export this. Do you have a good name for it? I would suggest condition2data.frame.

The plot function uses this data.frame. The fragment column is used for a geom_text.

I would like to see the two spectra on top of each other (their mz must be linked) as in the figure with the annotations smth like "c20Rme" and c20 in the highlighted case

What do you mean by linked? I could link (treat as equal) mz values by a user-defined ppm tolerance.

If you like I could provide a plotting function that could plot multiple conditions (e.g. plot(tds[, 1:2]) that annotates the fragments and highlight different fragments (would possibly highlight more than only the interesting objects), similar to the example you given above.

pavel-shliaha commented 6 years ago

Could you please explain step-by-step how this would work: I have 2 separate csv files (converted to txt to be loaded),

spectrum_1.txt spectrum_2.txt

I can load these files as dataframes:

S1 <- read.csv ("spectrum_1.txt", stringsAsFactors = FALSE, skip = 6) S1 <- S1[S1$Mass < 5600, ]

S2 <- read.csv ("spectrum_2.txt", stringsAsFactors = FALSE, skip = 6) S2 <- S1[S1$Mass < 5600, ]

what next? How do I get to the plot? (there should be a stage that specifies the fragment types and adducts)

For now please ignore miodifications, the sequence is "ARTKQTARKSTGGKAPRKQLATKAARKSAPATGGVKKPHRYRPGTVALRE"

By linked mean x axis are the same, so ions of the same mz are plotted on top of each other

PS: I can find a way to do this, so my question is how would you do it in the most intuitive way taking maximum advantage of the already available code

sgibb commented 6 years ago

OK, currently there is no single function that could easily exposed to the user. But I could create one if you wish (e.g. annotateMz(mz, peptidesequence, tolerance, redundantIonMatch, redundantFragmentMatch)).

To solve your task I would do something like (please note the use of internal topdownr functions via :::, the key functions are .calculateFragments and .matchFragments):

library("topdownr")
library("ggplot2")

S1 <- read.csv("spectrum_1.txt", stringsAsFactors=FALSE, skip=6)
S1 <- S1[S1$Mass < 5600, ]

S2 <- read.csv("spectrum_2.txt", stringsAsFactors=FALSE, skip=6)
S2 <- S1[S1$Mass < 5600, ]

pepseq <- "ARTKQTARKSTGGKAPRKQLATKAARKSAPATGGVKKPHRYRPGTVALRE"

## Calculate Fragments
fragments <- topdownr:::.calculateFragments(pepseq)

## Match Fragments
ppm <- 25e-6
k1 <- topdownr:::.matchFragments(S1$Mass, mz(fragments), tolerance=ppm)
k2 <- topdownr:::.matchFragments(S2$Mass, mz(fragments), tolerance=ppm)

S1$Fragment <- names(fragments)[k1]
S2$Fragment <- names(fragments)[k2]

S1$ID <- "Spectrum 1"
S2$ID <- "Spectrum 2"

Spectra <- rbind(S1, S2)

notNA <- !is.na(Spectra$Fragment)
Spectra$Label[notNA] <- paste0(round(Spectra$Mass[notNA], 2), Spectra$Fragment[notNA])

ggplot(Spectra, aes(x=Mass, y=Intensity)) +
    geom_segment(aes(xend=Mass, yend=0, color=ID)) +
    geom_text(aes(label=Label),
              angle=90,
              hjust=0.2,
              vjust=0.5,
              nudge_y=max(Spectra$Intensity)/100) +
    facet_grid(ID ~ ., scales="free_y") +
    theme(legend.position="none")

(somehow I am currently not able to upload an image to github but hopefully you can simply run the code above, the plot could be improved, of course)

pavel-shliaha commented 6 years ago

sorry, I might have not been clear, but the code should allow me to specify variable modifications at particular residues, which your code does not seem to be able to do

sgibb / topdownr

exposing useful functions #76