lgatto / synapter

Label-free data analysis pipeline for optimal identification and quantitation
4 stars 2 forks source link

bug in new synapter while reading pep3D #96

Closed pavel-shliaha closed 8 years ago

pavel-shliaha commented 8 years ago

Create MSnExp object Reading master identification peptide file... Reading quantitation Pep3D file... Error: You have 52 column names, but 15 columns

plese have a look (same files as I posted for previous bug)


sgibb commented 8 years ago

Unfortunately I can't reproduce it.

I am using your file synapter_analysis_BH.R but I had to change the quantspectra file because Apex.xml has (of course) a different file format than Spectrum.xml. Afterwards the import works without any error.

    l = list  (identpeptide = "masterFile.RDS",
               quantpeptide = "BC_F24_CW_HDMSE_01_IA_final_peptide.csv" ,
               quantpep3d =  "BC_F24_CW_HDMSE_01_Pep3DAMRT.csv",
               fasta = "TAIR10_comb_CC.fasta",
               quantspectra = "../BC_F24_CW_HDMSE_01_20150716161348/BC_F24_CW_HDMSE_01_Pep3D_Spectrum.xml")

    synapterAnalysis <- Synapter(l, master = TRUE)

Could you please reinstall the most recent synapter (devtools::install_github("lgatto/synapter@2.0) and retry?

What is the difference between (beside the file format) BC_F24_CW_HDMSE_01_Pep3D_Spectrum.xml and BC_F24_CW_HDMSE_01_Apex3D.xml?

sgibb commented 8 years ago

After a discussion with @pavel-shliaha we figured out that he filtered rows with df[df$Function != 2,]. With the filtered file I could reproduce this behaviour. The original file (that doesn't cause this bug) and the filtered file differ in their column names:

$ head -1 BC_F24_CW_HDMSE_01_Pep3DAMRT.csv BC_F24_CW_HDMSE_01_Pep3DAMRT.orig.csv 
==> BC_F24_CW_HDMSE_01_Pep3DAMRT.csv <==

==> BC_F24_CW_HDMSE_01_Pep3DAMRT.orig.csv <==

Because we just want to read columns listed in #87 we read the first line of each csv file and find the index of the corresponding columns. The quotes result in a mismatch (because obviously Function != \"Function\"). The source of the problem is that the quote argument in write.table is TRUE (default). write.csv(..., quote=FALSE) could fix it.

Nevertheless our import function takes care and removes all quotes before doing the column matching now.

pavel-shliaha commented 8 years ago
