Closed KKKKK-tech closed 1 year ago
I think the problem here is that pepxml.filter_df
has some assumptions that don't hold here. Its default is_decoy
function looks at the "protein"
column expecting to find a list of proteins, not a single string (this works if you apply filter_df()
to a dataframe produced from a pepXML file, or just open a file directly with filter_df()
).
In this case, you will have to provide your own is_decoy
function or series, e.g.:
df = pepxml.filter_df(df, is_decoy=df.protein.str.startswith("DECOY_"), fdr=0.02)
or
df['decoy'] = df.protein.str.startswith("DECOY_")
df = pepxml.filter_df(df, is_decoy='decoy', fdr=0.02)
Note that you can also just use auxiliary.filter
instead of pepxml.filter_df
here, but you will need to pass the key
parameter.
It worked! Thanks a lot!
I‘m trying to use pepxml.filter_df() to calculate FDR, but it seems that the isdecoy function cannot recognize the prefix though it's exactly "DECOY". Why is this?
df = pd.read_table(r'D:\Python\comet\22CPTAC_LUAD_P_BI_20180726_BD_f13.comet.txt') df = pepxml.filter_df(df,fdr = 0.02) df.to_csv('.\\df.txt',sep = '\t',index = False)
Here is my code. It's very simple so I don't know where the bug is.