sgibb / topdownr

R-package for the analysis of Thermo Orbitrap Fusion Top-Down Proteomic Data.
https://sgibb.github.io/topdownr/
GNU General Public License v3.0
1 stars 0 forks source link

exposing useful functions #76

Open pavel-shliaha opened 6 years ago

pavel-shliaha commented 6 years ago

There are some functions that I would like to use in my everyday work and they are actually implemented, but hidden. I would like to ask you to expose them. The ones I would like straight away are 2:

1) ion to fragment matching with fragments 2) plotting spectra annotation.

Exposing these two will provide a very simple interface for spectra annotation. As an example of why i need this. I have two 50 aa peptides with diffrent PTM position (Lys4 and Lys 24 methylation). I have done ETD on them

capture

The + 14Da shift shown by the box is the result of difference in PTM localisation. Basically I need to annotate the two spectra to see the differences in the fragments, due to different PTM localisation. I would like to see the two spectra on top of each other (their mz must be linked) as in the figure with the annotations smth like "c20Rme" and c20 in the highlighted case

I am not entirely sure what is the most intuitive and effective way of exposing this functionality of doing this. I imagine you work with spectra objects, with ion to fragment matching snd plotting, so maybe if I wanted to annotate a spectrum, I could create a new spectrum object. Please let me know what you think

sgibb commented 6 years ago

The "ion to fragment matching with fragments" function is called .condition2data.frame and is indeed hidden. You could test it via topdownr:::.condition2data.frame(tds[, 1]):

library("topdownr")
tds <- readTopDownFiles("../topdownrdata/inst/extdata/20170629_myo/")

df <- topdownr:::.condition2data.frame(tds[,1])

knitr::kable(df[55:64,])
mz intensity fragment type
55 3459.714 121038.52 None
56 3629.804 305726.62 y34 C-terminal
57 3662.896 154252.28 None
58 3763.955 198377.98 None
59 3820.976 76868.09 None
60 3878.944 608393.25 None
61 3894.952 27341.83 y36 C-terminal
62 3965.978 271138.03 b36* C-terminal
63 3981.994 53548.00 y37 C-terminal
64 4055.096 243017.27 None

I could easily export this. Do you have a good name for it? I would suggest condition2data.frame.

The plot function uses this data.frame. The fragment column is used for a geom_text.

I would like to see the two spectra on top of each other (their mz must be linked) as in the figure with the annotations smth like "c20Rme" and c20 in the highlighted case

What do you mean by linked? I could link (treat as equal) mz values by a user-defined ppm tolerance.

If you like I could provide a plotting function that could plot multiple conditions (e.g. plot(tds[, 1:2]) that annotates the fragments and highlight different fragments (would possibly highlight more than only the interesting objects), similar to the example you given above.

pavel-shliaha commented 6 years ago

Could you please explain step-by-step how this would work: I have 2 separate csv files (converted to txt to be loaded),

spectrum_1.txt spectrum_2.txt

I can load these files as dataframes:

S1 <- read.csv ("spectrum_1.txt", stringsAsFactors = FALSE, skip = 6) S1 <- S1[S1$Mass < 5600, ]

S2 <- read.csv ("spectrum_2.txt", stringsAsFactors = FALSE, skip = 6) S2 <- S1[S1$Mass < 5600, ]

what next? How do I get to the plot? (there should be a stage that specifies the fragment types and adducts)

For now please ignore miodifications, the sequence is "ARTKQTARKSTGGKAPRKQLATKAARKSAPATGGVKKPHRYRPGTVALRE"

By linked mean x axis are the same, so ions of the same mz are plotted on top of each other

PS: I can find a way to do this, so my question is how would you do it in the most intuitive way taking maximum advantage of the already available code

sgibb commented 6 years ago

OK, currently there is no single function that could easily exposed to the user. But I could create one if you wish (e.g. annotateMz(mz, peptidesequence, tolerance, redundantIonMatch, redundantFragmentMatch)).

To solve your task I would do something like (please note the use of internal topdownr functions via :::, the key functions are .calculateFragments and .matchFragments):

library("topdownr")
library("ggplot2")

S1 <- read.csv("spectrum_1.txt", stringsAsFactors=FALSE, skip=6)
S1 <- S1[S1$Mass < 5600, ]

S2 <- read.csv("spectrum_2.txt", stringsAsFactors=FALSE, skip=6)
S2 <- S1[S1$Mass < 5600, ]

pepseq <- "ARTKQTARKSTGGKAPRKQLATKAARKSAPATGGVKKPHRYRPGTVALRE"

## Calculate Fragments
fragments <- topdownr:::.calculateFragments(pepseq)

## Match Fragments
ppm <- 25e-6
k1 <- topdownr:::.matchFragments(S1$Mass, mz(fragments), tolerance=ppm)
k2 <- topdownr:::.matchFragments(S2$Mass, mz(fragments), tolerance=ppm)

S1$Fragment <- names(fragments)[k1]
S2$Fragment <- names(fragments)[k2]

S1$ID <- "Spectrum 1"
S2$ID <- "Spectrum 2"

Spectra <- rbind(S1, S2)

notNA <- !is.na(Spectra$Fragment)
Spectra$Label[notNA] <- paste0(round(Spectra$Mass[notNA], 2), Spectra$Fragment[notNA])

ggplot(Spectra, aes(x=Mass, y=Intensity)) +
    geom_segment(aes(xend=Mass, yend=0, color=ID)) +
    geom_text(aes(label=Label),
              angle=90,
              hjust=0.2,
              vjust=0.5,
              nudge_y=max(Spectra$Intensity)/100) +
    facet_grid(ID ~ ., scales="free_y") +
    theme(legend.position="none")

(somehow I am currently not able to upload an image to github but hopefully you can simply run the code above, the plot could be improved, of course)

pavel-shliaha commented 6 years ago

sorry, I might have not been clear, but the code should allow me to specify variable modifications at particular residues, which your code does not seem to be able to do