levitsky / pyteomics

Pyteomics is a collection of lightweight and handy tools for Python that help to handle various sorts of proteomics data. Pyteomics provides a growing set of modules to facilitate the most common tasks in proteomics data analysis.
http://pyteomics.readthedocs.io
Apache License 2.0
105 stars 34 forks source link

FDR calculate #87

Closed KKKKK-tech closed 1 year ago

KKKKK-tech commented 1 year ago

I‘m trying to use pepxml.filter_df() to calculate FDR, but it seems that the isdecoy function cannot recognize the prefix though it's exactly "DECOY". Why is this? df = pd.read_table(r'D:\Python\comet\22CPTAC_LUAD_P_BI_20180726_BD_f13.comet.txt') df = pepxml.filter_df(df,fdr = 0.02) df.to_csv('.\\df.txt',sep = '\t',index = False) Here is my code. It's very simple so I don't know where the bug is. QQ图片20221102162228

levitsky commented 1 year ago

I think the problem here is that pepxml.filter_df has some assumptions that don't hold here. Its default is_decoy function looks at the "protein" column expecting to find a list of proteins, not a single string (this works if you apply filter_df() to a dataframe produced from a pepXML file, or just open a file directly with filter_df()).

In this case, you will have to provide your own is_decoy function or series, e.g.:

df = pepxml.filter_df(df, is_decoy=df.protein.str.startswith("DECOY_"), fdr=0.02)

or

df['decoy'] = df.protein.str.startswith("DECOY_")
df = pepxml.filter_df(df, is_decoy='decoy', fdr=0.02)

Note that you can also just use auxiliary.filter instead of pepxml.filter_df here, but you will need to pass the key parameter.

KKKKK-tech commented 1 year ago

It worked! Thanks a lot!