ranking-agent / aragorn-ranker

Exposes TRAPI functions to add literature co-occurrence edges, convert publications to edge weights, and provide scores for answers.
MIT License
0 stars 1 forks source link

Different scores? #123

Open cbizon opened 1 year ago

cbizon commented 1 year ago

Two queries are run in robokop. One is (Ozone)-(gene)-(asthma) and the other is (asthma)-(gene)-(Ozone). The same answers are returned. But the scores are slightly different between the two. Attached are two messages. The first result in each (NCBIGene:7412) show the different scores.

ROBOKOP_message_asthma-gene-ozone_trapi1.4dev.json.txt ROBOKOP_message_ozone-gene-asthma_trapi1.4dev.json.txt

As far as I can tell, the two results are the same in terms of number of edges bound, and the parameters of the omnicorp support edges.

This suggests to me a bug in ranker somewhere, but the differences are small enough that perhaps it is something numerical?

I also notice that every weight I saw has a value of 1. Is this accurate? Or are these weights no longer used in ranking?

kennethmorton commented 11 months ago

Interesting case!

Looking at the first result in both sets, they are basically the same, but not exactly. I wrote some code to do a quick and dirty look at the content of the edges between the different curies in the result. I confirmed that if you remove directionality and only consider a symmetric weight matrix, the same edges are all present. The disagreements are between the subjects and objects on otherwise directionless edges. I believe this is due to Omnicorp and how it must make some arbitrary selection of subjects and objects.

This is fine, except in how it impacts ranker and weight calculation. Roughly for each subject/object pair, each source can only contribute a single weight for each property type. If there are multiple edges from the same source that have the same property value (ex. CTD publications), the maximum property value is taken. Once the edges are collapse for each unique subject/object/source/property, there can be subtle differences if the subjects, objects flip around.

Once we have a weight matrix, we make it symmetric when we calculate the graph laplacian. If instead we make the matrix symmetric while checking for subject/object/source/property collisions, it should clear up the discrepancy.

I believe this is fixed in #124 but I'd like a few more eyes on it. @uhbrar @maximusunc

maximusunc commented 11 months ago

Please disregard if this is too edge case, but I'd like to toss a small wrench. In ICEES-KG, we have many edges with the same subject/object/source/property but that come from different datasets and different years. It sounds like the current ranker would not handle this case.

kennethmorton commented 11 months ago

I think that's an interesting point. We should consider what other aspects of TRAPI we should using to identify unique edges.