guo-xuan / Sipros-Ensemble

Sipros Ensemble - Mateproteomic Search
GNU General Public License v3.0
2 stars 2 forks source link

Protein FDR calculation #1

Open ohickl opened 5 years ago

ohickl commented 5 years ago

Hi, I am a bit confused by the values calculated for the final protein report. I looks like this with my data for example:

The ~12500 proteins with the 60 decoys are reported afterwards. But how does it end up with 0.96% decoy FDR? If it only found 241 decoys with almost 40k proteins before filtering it was already way below 1% or am I missing something?

Alo it get the following error trying to produce a pepXML file:

python2.7 /opt/sipros/Scripts/sipros_psm_tabulating.py -i /scratch/maxquant/OH/Sipros/method_test/markert_strap_brp_01/output -o /scratch/maxquant/OH/Sipros/method_test/markert_strap_brp_01/output -c /scratch/maxquant/OH/Sipros/method_test/markert_strap_brp_01/20190703_method_test.cfg -x [Fri Jul 5 11:11:30 2019] Beginning Sipros Ensemble Tabulating (1.0.1 (Alpha)) [Step 1] Parse options and get config file: Running -> Done! [Step 2] Generate PSM table: Running -> Done! [Step 3] Merge Protein list: Running -> Done! [Step 4] Generate Pepxml: Running -> Traceback (most recent call last): File "/opt/sipros/Scripts/sipros_psm_tabulating.py", line 662, in <module> sys.exit(main()) File "/opt/sipros/Scripts/sipros_psm_tabulating.py", line 647, in main writePepxml(base_out + '.tab', config_dict, modification_dict, element_modification_list_dict, output_folder) File "/opt/sipros/Scripts/sipros_psm_tabulating.py", line 406, in writePepxml psm_obj.score_process() File "/opt/sipros/Scripts/sipros_psm_tabulating.py", line 348, in score_process diff = (pep.scorelist[idx1]/l1[0].scorelist[idx1]) - 1 ZeroDivisionError: float division by zero

Also are re you still actively working on Sipros Ensemble?

Love Sipros Ensemble and the results so far!

Cheers

Oskar

guo-xuan commented 5 years ago

Hi Oskar,

Thank you for your questions.

The reason for 0.96% as the FDR is that we use half of the decoy PSMs for training a machine model. So the estimate decoy proteins should be doubled, i.e., FDR = 60*2/12462.

There are a few other parameters for protein filtering, such as the minimum number of required unique peptides. Some of these 37016 proteins may only support by shared peptides, so, get grouped together and are counted just once.

I hope this helps you and I am happy to answer if you have any further questions.

Bests, Xuan


From: 0ssH notifications@github.com Sent: Friday, July 5, 2019 4:13 AM To: guo-xuan/Sipros-Ensemble Sipros-Ensemble@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [guo-xuan/Sipros-Ensemble] Protein FDR calculation (#1)

Hi, I am a bit confused by the values calculated for the final protein report. I looks like this with my data for example:

The ~12500 proteins with the 60 decoys are reported afterwards. But how does it end up with 0.96% decoy FDR? If it only found 241 decoys with almost 40k proteins before filtering it was already way below 1% or am I missing something?

Also are re you still working on it?

Love Sipros Ensemble and the results so far!

Cheers

Oskar

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/guo-xuan/Sipros-Ensemble/issues/1?email_source=notifications&email_token=ADNGYADUU47CV4XRSUW6X7DP54GCRA5CNFSM4H6JN3AKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G5QL4KA, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADNGYACHQMF7R4FFAZ6XUJ3P54GCRANCNFSM4H6JN3AA.

ohickl commented 5 years ago

Hi Xuan,

got it. Thanks! Do you plan on implementing protein level FDR filtering? I think I read something about it in the readme or the publication. I tried it by setting the FDR_Filtering = Protein in the config file but it does still seem to Filter on 1% peptide FDR. I would like to do that, because I tend to get a protein level FDR of above 1% when filtering on at least 1 or more unique peptides. The effect is especially strong when searching large databases (e.g. the one I tried contained about 18*10^6 target sequences). Thanks for your time!

Oskar

guo-xuan commented 5 years ago

Hi Oskar,

I am a little confused. Do you want 1% FDR at protein level or peptide level?

Xuan


From: 0ssH notifications@github.com Sent: Wednesday, July 31, 2019 2:29 AM To: guo-xuan/Sipros-Ensemble Sipros-Ensemble@noreply.github.com Cc: Guo,Xuan xuan_guo@outlook.com; Comment comment@noreply.github.com Subject: Re: [guo-xuan/Sipros-Ensemble] Protein FDR calculation (#1)

Hi Xuan,

got it. Thanks! Do you plan on implementing protein level FDR filtering? I think I read something about it in the readme or the publication. I tried it by setting the FDR_Filtering = Protein in the config file but it does still seem to Filter on 1% peptide FDR. I would like to do that, because I tend to get a protein level FDR of above 1% when filtering on at least 1 or more unique peptides. The effect is especially strong when searching large databases (e.g. the one I tried contained about 18*10^6 target sequences). Thanks for your time!

Oskar

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/guo-xuan/Sipros-Ensemble/issues/1?email_source=notifications&email_token=ADNGYABKNTIV3HTKXKFGXFLQCE5PPA5CNFSM4H6JN3AKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3GLOSI#issuecomment-516732745, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADNGYAEZAPHJH3DNIK5JODLQCE5PPANCNFSM4H6JN3AA.

ohickl commented 5 years ago

Hi Xuan,

sorry about that. I would like to filter on protein level.

guo-xuan commented 5 years ago

Hi Oskar,

Sorry for the late reply. I am hell busy these days. I don't have a publicly available protein FDR control script. If 1% protein FDR is designed, what I would do is to try a set of peptide FDRs to see which one gives the exact 1% protein FDR or the closest. I have a python script for this purpose, but it is not user-friendly. I attached that script in this email anyway. Note that the comments in this python script may not be helpful. I may be able to upgrade this script, but I don't know when I have time to do that.

Bests, Xuan


From: Oskar Hickl notifications@github.com Sent: Tuesday, August 13, 2019 4:28 AM To: guo-xuan/Sipros-Ensemble Sipros-Ensemble@noreply.github.com Cc: Guo,Xuan xuan_guo@outlook.com; Comment comment@noreply.github.com Subject: Re: [guo-xuan/Sipros-Ensemble] Protein FDR calculation (#1)

Hi Xuan,

sorry about that. I would like to filter on protein level.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/guo-xuan/Sipros-Ensemble/issues/1?email_source=notifications&email_token=ADNGYACZORRPL2KJDLAH473QEJ5FBA5CNFSM4H6JN3AKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4FDCXY#issuecomment-520761695, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADNGYAEOM6WETOMB3XVVSYTQEJ5FBANCNFSM4H6JN3AA.

ohickl commented 4 years ago

Hey Xuan,

sorry for the late reply. I am still interested in your python script. Could you send it to me at oskar.hickl@uni.lu? Your last reply went to github and there was no file attached. Are there any news regarding the development of Sipros Ensemble? Id love to see it continued!

Cheers Oskar