lgatto / synapter

Label-free data analysis pipeline for optimal identification and quantitation
https://lgatto.github.io/synapter/
4 stars 2 forks source link

bug in new synapter while reading pep3D #96

Closed pavel-shliaha closed 8 years ago

pavel-shliaha commented 8 years ago

Create MSnExp object Reading master identification peptide file... Reading quantitation Pep3D file... Error: You have 52 column names, but 15 columns

plese have a look (same files as I posted for previous bug)

Z:\RAW\pvs22_QTOF_DATA_data3\data_for_synapter_2.0\bug\2015_07_19_problems_with_loading_spectrum\for_synapter

sgibb commented 8 years ago

Unfortunately I can't reproduce it.

I am using your file synapter_analysis_BH.R but I had to change the quantspectra file because Apex.xml has (of course) a different file format than Spectrum.xml. Afterwards the import works without any error.

    l = list  (identpeptide = "masterFile.RDS",
               quantpeptide = "BC_F24_CW_HDMSE_01_IA_final_peptide.csv" ,
               quantpep3d =  "BC_F24_CW_HDMSE_01_Pep3DAMRT.csv",
               fasta = "TAIR10_comb_CC.fasta",
               quantspectra = "../BC_F24_CW_HDMSE_01_20150716161348/BC_F24_CW_HDMSE_01_Pep3D_Spectrum.xml")

    synapterAnalysis <- Synapter(l, master = TRUE)

Could you please reinstall the most recent synapter (devtools::install_github("lgatto/synapter@2.0) and retry?

What is the difference between (beside the file format) BC_F24_CW_HDMSE_01_Pep3D_Spectrum.xml and BC_F24_CW_HDMSE_01_Apex3D.xml?

sgibb commented 8 years ago

After a discussion with @pavel-shliaha we figured out that he filtered rows with df[df$Function != 2,]. With the filtered file I could reproduce this behaviour. The original file (that doesn't cause this bug) and the filtered file differ in their column names:

$ head -1 BC_F24_CW_HDMSE_01_Pep3DAMRT.csv BC_F24_CW_HDMSE_01_Pep3DAMRT.orig.csv 
==> BC_F24_CW_HDMSE_01_Pep3DAMRT.csv <==
"Function","spectrumID","isBinned","rt_min","rtErr","mwHPlus","errPPM","charge","Intensity","Counts","errCounts","pep_numIons","clusterID","clust_drift","clust_driftErr","charge_numIons","isFid","ion_ID","ion_z","ion_iso","charge_mwHPlus","ion_rt","ion_rtSd","ion_chFWHM","ion_area","ion_counts","ion_intenSD","ion_m_z","ion_m_zUncal","ion_m_zSD","ion_msFWHM","ion_satFlag","ion_drift","ion_driftFWHM","ion_driftSD","ion_deltaDrift","ion_drSNR","ion_deltaPPM","ion_deltaT","ion_msSNR","ion_chSNR","ion_clusterDeltaPPM","ion_iso_ID","ion_iso_deltaPPM","ion_iso_deltaT","ion_iso_deltaDrift","ion_nonZeroElementCount","ion_centerSumResponse","ion_innerElementCount","ion_ratio12C","Model","ratioToModel"

==> BC_F24_CW_HDMSE_01_Pep3DAMRT.orig.csv <==
Function,spectrumID,isBinned,rt_min,rtErr,mwHPlus,errPPM,charge,Intensity,Counts,errCounts,pep_numIons,clusterID,clust_drift,clust_driftErr,charge_numIons,isFid,ion_ID,ion_z,ion_iso,charge_mwHPlus,ion_rt,ion_rtSd,ion_chFWHM,ion_area,ion_counts,ion_intenSD,ion_m_z,ion_m_zUncal,ion_m_zSD,ion_msFWHM,ion_satFlag,ion_drift,ion_driftFWHM,ion_driftSD,ion_deltaDrift,ion_drSNR,ion_deltaPPM,ion_deltaT,ion_msSNR,ion_chSNR,ion_clusterDeltaPPM,ion_iso_ID,ion_iso_deltaPPM,ion_iso_deltaT,ion_iso_deltaDrift,ion_nonZeroElementCount,ion_centerSumResponse,ion_innerElementCount,ion_ratio12C,Model,ratioToModel

Because we just want to read columns listed in #87 we read the first line of each csv file and find the index of the corresponding columns. The quotes result in a mismatch (because obviously Function != \"Function\"). The source of the problem is that the quote argument in write.table is TRUE (default). write.csv(..., quote=FALSE) could fix it.

Nevertheless our import function takes care and removes all quotes before doing the column matching now.

pavel-shliaha commented 8 years ago

thanks!