kbaseattic / relation_engine_spec

Specifications and config for the KBase Relation Engine
https://kbase.us
MIT License
0 stars 7 forks source link

update go queries #91

Closed zhlu9890 closed 5 years ago

jayrbolton commented 5 years ago

And were you able to verify that these actually are the incorrect results agains the source data? What is the correct result?

On Mon, Sep 16, 2019 at 9:46 AM Zhenyuan Lu notifications@github.com wrote:

@zhlu9890 commented on this pull request.

In stored_queries/GO/GO_get_ancestors.yaml https://github.com/kbase/relation_engine_spec/pull/91#discussion_r324777008 :

query: |

  • FOR v, e IN 1..1000000 OUTBOUND CONCAT("GO_test_term/", @key) GO_test_edges_isa
  • SORT v._key ASC
  • FOR t in GO_terms
  • FILTER t.id == @id
  • FILTER t.created <= @ts AND t.expired >= @ts
  • limit 1
  • FOR v, e IN 1..100 OUTBOUND t GO_edges

I tried

PRUNE e.created > @ts OR e.expired < @ts

It returned 0 result. It is the same for

PRUNE e.created <= @ts OR e.expired >= @ts

When I was using

FILTER e.created <= @ts AND e.expired >= @ts

It returned 16 results.

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/kbase/relation_engine_spec/pull/91?email_source=notifications&email_token=ACCV47F3K2IPSO7LIM2TS5LQJ6Z4ZA5CNFSM4IWR3XFKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCE3CHRA#discussion_r324777008, or mute the thread https://github.com/notifications/unsubscribe-auth/ACCV47EL2OOGJNIZGEONKJ3QJ6Z4ZANCNFSM4IWR3XFA .

zhlu9890 commented 5 years ago

I looked the 16 results from FILTER e.created <= @ts AND e.expired >= @ts and those seems to be correct, however when using PRUNE e.created > @ts OR e.expired < @ts the 16 results were not returned. Here is the query, could you try to see if I miss anything,

WITH GO_terms
  FOR t in GO_terms
    FILTER t.id == @id
    FILTER t.created <= @ts AND t.expired >= @ts
    limit 1
    FOR v, e IN 1..100 OUTBOUND t GO_edges
      PRUNE e.created > @ts OR e.expired < @ts
      FILTER e.type == "is_a"
      SORT v.id ASC
      LIMIT @offset, @limit
      RETURN {term: v, edge: e}
jayrbolton commented 5 years ago

@zhlu9890 We still need to verify what are the actual correct results. If you go to http://geneontology.org/docs/download-ontology/, download the "go.obo" file, find the entry for your term ID, and see if you can trace what the correct results are for the query you are testing.

zhlu9890 commented 5 years ago

I downloaded the go.obo file and traced GO:0000002. Here is the my finding:

GO:0000002 is_a GO:0007005
GO:0007005 is_a GO:0006996
GO:0006996 is_a GO:0016043
GO:0016043 is_a GO:0009987
GO:0016043 is_a GO:0071840
GO:0009987 is_a GO:0008150
GO:0071840 is_a GO:0008150

So the 16 results from FILTER e.created <= @ts AND e.expired >= @ts are correct, it had all the edges with some duplicates. I am still not sure why PRUNE e.created > @ts OR e.expired < @ts is not working.

jayrbolton commented 5 years ago

As I mentioned on slack, it looks like the query returns the correct results, and the data currently loaded in the GO_terms and GO_edges collections are incorrect. We can work with Gavin to get this sorted out. In the mean time, I will check to see if this PR is ready to merge.

jayrbolton commented 5 years ago

Again as mentioned as slack, the query was actually not right, and we need to add a e != null condition in the prune expression, as described here: https://github.com/arangodb/arangodb/issues/9290

jayrbolton commented 5 years ago

@zhlu9890 Let me know when this PR is ready for a re-review and merge.

zhlu9890 commented 5 years ago

Updated queries and adopted Eric's FILTER to use path.edges[*].attribute ALL instead of PRUNE. The queries with depth 1 just use simple FILTER.

jayrbolton commented 5 years ago

We are in business :100: :champagne: