lgatto / synapter

Label-free data analysis pipeline for optimal identification and quantitation
https://lgatto.github.io/synapter/
4 stars 2 forks source link

Error: "The Pep3D file does not correspond to the given Quantitation Final Peptide file!" #112

Closed yajin2016 closed 8 years ago

yajin2016 commented 8 years ago

Hi, I installed R 3.2.3 and newest synapter and ran the following lines referring to the recommended settings :

library(tcltk) synergise(master = FALSE, fdr = 0.01,

  • fdrMethod = c("BH", "Bonferroni", "qval"), fpr = 0.01, peplen = 7,
  • missedCleavages = 0, identppm = 20, quantppm = 20, uniquepep = TRUE,
  • span = 0.05, grid.ppm.from = 2, grid.ppm.to = 20, grid.ppm.by = 2,
  • grid.nsd.from = 0.5, grid.nsd.to = 5, grid.nsd.by = 0.5,
  • grid.subset = 1, grid.n = 0, grid.param.sel = c("auto", "model",
  • "total", "details"), mergedEMRTs = c("rescue", "copy", "transfer"),
  • css = NULL, verbose = TRUE)

It let me choose the outputdir, identpeptide, quantpeptide, quantpep3d and fasta folder/files, and it ran and showed:

Reading identification final peptide file... Reading quantitation final peptide file... Reading quantitation Pep3D file... Error in xx$loadData() : The Pep3D file ‘J:/1603 Synapter/1/MSE-S1_Pep3DAMRT.csv’ does not correspond to the given Quantitation Final Peptide file ‘J:/1603 Synapter/1/MSE-S1_IA_final_peptide.csv’!

I found you ever discussed about it in 2014, on https://github.com/lgatto/synapter/issues/42 and mentioned it possibly came from a PLGS 2.5.2 bug. But my data was processed by 3.0.2. I tried another sets and it returned the same error. Since it didn't give a warning as after you fixed in issue 42, like warning "# Warning in .self$filterMismatchingQuantIntensities() : Filtering 61 (of 21175 total) entries of the quantitation final peptide and Pep3D file because they differ in their intensity values." I wonder if it might be caused by other problems of my data?

Thanks in advance!

Jin

sgibb commented 8 years ago

I wonder if it might be caused by other problems of my data?

You are right. This error was caused by mismatching IDs between the Pep3D and quantitation final peptides file. Did you have another PLGS version to test?

pavel-shliaha commented 8 years ago

this is easy to test. Load your final_peptide and P3D file as dataframes in R and merge them by the EMRT number. If the mz in final_peptide and P3D wont agree for the same EMRT then you have a problem in your data. I can help you with this if you dont have much R experience. Just add yourself to my skype: pavel_shliaha

yajin2016 commented 8 years ago

@sgibb yes, I used 3.0.2 for these data, but I have the older version 2.5.2.

@pavel-shliaha Thank you so much! I would like to do the test, if you can teach me how, it will be great. I already add you as an skype contact.

sgibb commented 8 years ago

@yajin2016 I don't know whether you already talked to @pavel-shliaha but you can try the following to investigate your problem:

## read final peptide data
fp <- read.csv("your_final_peptide_file.csv", stringsAsFactors=FALSE)
## read pep3D data
p3 <- read.csv("your_pep3d_file.csv", stringsAsFactors=FALSE)

## find matching IDs
idx <- match(fp$precursor.leID, p3$spectrumID)

## any mismatch? Should return FALSE (but in your case it would be TRUE)
anyNA(idx)

## find rows that are present in final peptide but not in pep3D
## (means identification data without quantitation data)
rowIdx <- which(is.na(idx))

## print rows
fp[rowIdx, ]

## export to csv
write.csv(fp[rowIdx, ], file = "mismatch.csv")
yajin2016 commented 8 years ago

@sgibb Many thanks. I've contacted with Pavel and he taught me step by step how to test the data. Then I found the error was caused by the encoding thing of PLGS output of _IA_final_peptide.csv files.

At first I ran the following lines:

setwd("F:/F/Jin-16/Synapter/1") MSE_final_peptide <- read.csv("MSE-S1_IA_final_peptide.csv",stringsAsFactors = FALSE)

Below is the snapshot of the resulted object MSE_final_peptide,

if not define encoding when readcsv

I noticed that in rows 8 and 18, the precursorIDs are not even integers. Then I found the separation between columns already went wrong at the column "peptideMatchedProductsString". The last character of the string (in shift-jis it looks like -, in mac it looks like a small circle) was not correctly recognized and therefore the value that should have been in the next cell was kept at the ending part of the string. I actually observed this problem before, when trying to directly open those IA output .csv files by Excel, and it happened not rarely, to my experience.

Anyway, I tried to define the fileEncoding as "mac" for read.csv:
MSE_final_peptide <- read.csv("MSE-S1_IA_final_peptide.csv",stringsAsFactors = FALSE, fileEncoding = "mac")

And the object looked correct. defined encoding when readcsv

And further I ran the test for file corresponding issue:

setwd("F:/F/Jin-16/Synapter/1") iconvlist() ## found "mac" can work. MSE_final_peptide <- read.csv("MSE-S1_IA_final_peptide.csv",stringsAsFactors = FALSE, fileEncoding = "mac") HDMSE_final_peptide <- read.csv("HDMSE-S1_IA_final_peptide.csv",stringsAsFactors = FALSE, fileEncoding = "mac") Pep3D <- read.csv("MSE-S1_Pep3DAMRT.csv",stringsAsFactors = FALSE) sel <- Pep3D$Function == 1 Pep3D_Func1 <- Pep3D[sel,] sel2 <- !duplicated(Pep3D_Func1$spectrumID) Pep3D_Func1_EMRT <- Pep3D_Func1 [sel2,] MSE_final_peptide_ID_Intensity <- MSE_final_peptide[, c("precursor.leID","precursor.inten")] Pep3D_Func1_EMRT_ID_counts <- Pep3D_Func1_EMRT[,c("spectrumID","Counts")] mergeDF <- merge(MSE_final_peptide_ID_Intensity,Pep3D_Func1_EMRT_ID_counts, by.x ="precursor.leID", by.y = "spectrumID")

And I think the test meant the two files are actually corresponded.
160319 try r -objects

From now on I would try synapter analysis for these files. Sincere thanks to @pavel-shliaha, @sgibb and @lgatto!