pFindStudio / pFind3

23 stars 7 forks source link

Export pFind.protein to txt #9

Open heuselm opened 5 years ago

heuselm commented 5 years ago

Hi, great tool! To make it even better, it would be nice to also support protein level result export to .txt/.tsv, as is implemented for the peptide level table.

at least pFind.protein cannot readily be read by Excel or R (read.table, data.table::fread)

Hao-Chi commented 5 years ago

Hi,

Thank you for your suggestion. pFind.protein is a tab-delimited file in this version, but it may not be easy to read by Excel because of the tree structure between proteins and peptides. We will provide a new output format that only considers the protein information in the next version, and we hope you can give us more advice about the design of the format. Thank you.

Hao

heuselm commented 5 years ago

Ni-hao Hao,

thanks a lot for your reply and planning to implement. Advice for format: I think what biochemist users benefit from the most is a tab-separated matrix format table with one row per protein group and separate columns for the quants in columns, similar to the MaxQuant output proteins.txt:

Columns e.g.: parent_proteins | parent_proteins_metadata | protein_group_n | peptides_n | peptides_seq | sequence_coverage | MSrun1_speccounts | MSrun2_speccounts | MSrun3_speccounts | MSrun4_speccounts

It’d definitely be cool to also include different quantification types, beyond the spectral counts used in the table above, e.g.

Further, I’d suggest demonstrating strict error control on protein level using e.g. Mayu http://tools.proteomecenter.org/wiki/index.php?title=Software:Mayu

One more point to consider: licensing need will limit the spread of the tool in the academic community. So does the fact that it’s not open source. Opening source would boost the acceptance among computational proteomics experts; “Normal” users would still appreciate the tool for its useability, especially nice interface.

Best, mh

Hao-Chi commented 5 years ago

Hi Moriz,

Thank you so much for your so detailed reply! It is greatly helpful for us. I will ask the developers to implement it soon.

Best,

Hao

RiccardoZenezini commented 5 years ago

so first of all thank you Hao and all the team for your great work with this program. Is really fast and easy to use. I confirm what Moriz said before about the importance of other possible label free quan types (and about that it would be quite interesting to see XIC values like for maxquant). About the structure of the results file, is true that for proteomics end-users a tab-separated matrix would be easier work with, but is also true that many interesting info are present in your output. Maybe the fastest idea would be to create 2 type of output, one would be the your classic output, and one with only quant data like Moritz said. In any case it would be help to have in the output not like now AC "sp|P02788|TRFL_HUMAN" and DE "Lactotransferrin OS=Homo sapiens OX=9606 GN=LTF PE=1 SV=6" but 5 columns looking like P02788 | TRFL_HUMAN | Lactotransferrin | Homo sapiens | LTF

I have other possible ideas that could help a proteomic user: 1) create a pFind.peptide, where each line contains the info about one identified peptide (like sequence, parent protein, Modification, intensity/PSM per exp, length, missed cleav... and all other interesting numbers like mass, mass shift...) this way one could see easily look if if a peptide (maybe modified or with more missed cleav) changes in the runs 2) many times the name of the raw file is useless (because experiments are named with code) or really long (because they have date, name of operator, name of instrument, of LC... and so on) so it would be helpful to change it before the identification starts and then work with something like "CTRL_01" 3) in pFind.spectra it would help to have a column with only the name of the starting raw file, so I can easily filter for that and look only if one file for average error, type of modification and so on 4) many modification are rare or not so interesting, so it would be interesting to sort them out and see how the quan results change (and then maybe export new tables from pBuild)

to conclude my long message (sorry for that) in my pBuild 3.0 I can't do much in the 'protein group' menu, since sorting, mouse right click and multiple selection do not work

thanks again Riccardo