mhk7 / alignOntology

Code for aligning two ontologies. Originally used in "A gene ontology inferred from molecular networks" doi:10.1038/nbt.2463
BSD 2-Clause "Simplified" License
4 stars 6 forks source link

FDR values > 1 #4

Closed antonkratz closed 5 years ago

antonkratz commented 5 years ago

Using calculateFDRs, in the resulting file FDRs I sometimes get FDR values (4th column) which are larger than one. Is this a bug? My understanding and expectation is that a FDR value should always be between 0 and 1. Could I have a comment on this? Thanks!

mhk7 commented 5 years ago

Hi Anton,

Sorry about the slow response. This is not a bug just a result of the way FDR is estimated which can sometimes give values a bit above 1 (which can be thought of as ~=1). The FDR is an estimate of what fraction of terms aligned between the reference ontology and constructed ontology at a particular score were achieved by random chance. This is calculated by creating several ontologies with identical structure to the constructed ontology but with genes randomly assigned to terms. These random ontologies are then aligned with the reference ontology to get alignment scores. Then for each alignment between the reference and constructed ontologies (with an alignment score S), the program counts the number of term alignments with at least score S (let's call this Ts for True Alignments with score >= S). It also counts the average number of term alignments with at least score S in the randomly created ontologies (let's call this Rs). The FDR is then Rs/Ts, with the idea that at a given score S by random chance we would see Rs alignments but we are actually seeing Ts, so Rs/Ts of these alignments are likely to have been achieved by random chance.

Since this is an estimate with randomness involved, there are sometimes situations for low alignment scores where there are actually a few more random alignments than true alignments with scores greater than a given score S. This gives an FDR >1 by this estimate. Obviously that doesn't really make any sense as you have noted but I never bothered to round that back to 1 since in most cases any FDR anywhere close to that high is indicative of a probably unimportant and easily discarded alignment. I usually used FDR cutoffs of 0.05 of 0.1 in most situations.

Anyway, hope that is helpful. Let me know if you have any other questions or if that explanation isn't clear.

Michael

On Mon, Jul 1, 2019 at 5:50 PM Anton Kratz notifications@github.com wrote:

Using calculateFDRs, in the resulting file FDRs I sometimes get FDR values (4th column) which are larger than one. Is this a bug? My understanding and expectation is that a FDR value should always be between 0 and 1. Could I have a comment on this? Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mhk7/alignOntology/issues/4?email_source=notifications&email_token=AB4KDMLLEJNVIIDLNVC6SRTP5KC2DA5CNFSM4H4WSHT2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G4X7D4A, or mute the thread https://github.com/notifications/unsubscribe-auth/AB4KDMLZXIPXOFE63QY4MUTP5KC2DANCNFSM4H4WSHTQ .

antonkratz commented 5 years ago

Thank you Mike for this very clear and thorough explanation! I will close this issue.