ontoportal-lirmm / goo

Graph Oriented Objects (GOO) for Ruby. A RDF/SPARQL based ORM.
http://ncbo.github.io/goo/
Other
0 stars 4 forks source link

Feature: Add Virtuso, Allegrograph and Graphdb integration to GOO #48

Closed syphax-bouazzouni closed 7 months ago

syphax-bouazzouni commented 11 months ago

Prerequisites

The challenge was that the same query was not parsed from one triple store to another. i.e. the same SPARQL query returns different things. form store to another.

So GraphDb works the same as 4store, and is integrated easily into the current code and the way we generate our queries (see an example of a generated query in the section "Generated Queries examples ").

Virtuoso and AllegroGraph support a better way of handling queries, so we updated our code to write better and simpler queries (see an example of a generated query in the section "Generated Queries examples ").

Benchmarks

The table below shows the time that it takes to run all the tests in each of the triple-store, with different slice configurations (slices, is a configuration variable to define how we slice our fetched data, e.g. if we query 100 elements and the slice is 20 we will do 5 requests fetching 20 elements each)

Store Time with slices=20 Time with slices=500 (production) Time with slices=[1,5,10,20] accumulated
4store 49.691710s 43.782962s 7,68min
GraphDb 57.658306s 45.543078s 2,83min πŸ…
Virtuoso 24.629962s πŸ… 29.768298s πŸ… 2,90min
AllegroGraph 31.441410s 34.950169s 2,96min

The benchmarks up use our version code not the NCBO one, and we use a different approach to building queries since https://github.com/ncbo/goo/pull/124, so below the benchmark using the NCBO code for 4store and AllegroGraph

Store Time with slices=20 Time with slices=500 (production) Time with slices=[1,5,10,20] accumulated
4store 45.513319s 36.200109s 3,7min πŸ…
AllegroGraph 29.639939s 34.194236s 5,4min

So to summarize here is the final board

Store/code Time with slices=500 (production)
Virtuoso - LIRMM code 29.768298s πŸ…
AllegroGraph - NCBO code 34.194236s
AllegroGraph - LIRMM code 34.950169s
4store - NCBO code 36.200109s
4store - LIRMM code 43.782962s
GraphDb - LIRMM code 45.543078s

Generated Queries examples

See https://github.com/ontoportal-lirmm/goo/pull/48#issuecomment-1763725927

Changes

syphax-bouazzouni commented 11 months ago

In the section, I will run the same query in the different code bases and give the generated SPARQL query

The query will be the following

University.where.include(:name, programs: [:name]).all

NCBO code

# Get the universities including their name  and programs (inverse of  university property)
SELECT DISTINCT ?id ?name ?programs 
FROM <http://goo.org/default/University> 
FROM <http://goo.org/default/Program> 
WHERE { 
      ?id a <http://goo.org/default/University> . 
     OPTIONAL { ?id <http://goo.org/default/name> ?name .  } 
     OPTIONAL { ?programs <http://goo.org/default/university> ?id .  } 
}

# Get the 9 programs names 
SELECT DISTINCT ?id ?name 
FROM <http://goo.org/default/Program> 
WHERE { 
   ?id a <http://goo.org/default/Program> . 
   OPTIONAL { ?id <http://goo.org/default/name> ?name .  } 
   FILTER(?id = <http://example.org/program/UPM/CompSci> || ?id = <http://example.org/program/UPM/BioInformatics> || ?id = <http://example.org/program/UPM/Medicine> || ?id = <http://example.org/program/Stanford/Medicine> || ?id = <http://example.org/program/Stanford/BioInformatics> || ?id = <http://example.org/program/Stanford/CompSci> || ?id = <http://example.org/program/Southampton/CompSci> || ?id = <http://example.org/program/Southampton/Medicine> || ?id = <http://example.org/program/Southampton/BioInformatics>) 
}

Notes:

LIRMM code for 4store and GraphDb

# Get the universities including their name  and programs (inverse of  university property)
SELECT DISTINCT ?id ?attributeProperty ?attributeObject 
FROM <http://goo.org/default/University> 
FROM <http://goo.org/default/Program> 
WHERE { 
     ?id a <http://goo.org/default/University> . 
    OPTIONAL {
       { ?id <http://goo.org/default/name> ?attributeObject . BIND( "name" as ?attributeProperty) } 
        UNION  
        { ?attributeObject <http://goo.org/default/university> ?id . BIND( "programs" as ?attributeProperty) }
    }
}

# Get the 9 programs names 
SELECT DISTINCT ?id ?attributeProperty ?attributeObject 
FROM <http://goo.org/default/Program> 
WHERE { 
    ?id a <http://goo.org/default/Program> . 
    OPTIONAL { 
       { ?id <http://goo.org/default/name> ?attributeObject . BIND( "name" as ?attributeProperty) } 
    } 
    FILTER(?id = <http://example.org/program/UPM/CompSci> || ?id = <http://example.org/program/UPM/BioInformatics> || ?id = <http://example.org/program/UPM/Medicine> || ?id = <http://example.org/program/Stanford/Medicine> || ?id = <http://example.org/program/Stanford/BioInformatics> || ?id = <http://example.org/program/Stanford/CompSci> || ?id = <http://example.org/program/Southampton/CompSci> || ?id = <http://example.org/program/Southampton/Medicine> || ?id = <http://example.org/program/Southampton/BioInformatics>)
 }

Notes

LIRMM code for Virtuoso and Allgeropgraph

# Get the universities including their name and programs (inverse of  university property)
SELECT DISTINCT ?id ?attributeProperty ?attributeObject 
FROM <http://goo.org/default/University> 
FROM <http://goo.org/default/Program> 
WHERE { 
      ?id a <http://goo.org/default/University> . 
      OPTIONAL { 
          { ?id ?attributeProperty ?attributeObject . FILTER(?attributeProperty = <http://goo.org/default/name>)  } 
          UNION  
          { ?attributeObject ?attributeProperty ?id . FILTER(?attributeProperty = <http://goo.org/default/university>)  } 
      }
 }
# Get the 9 programs names 
SELECT DISTINCT ?id ?attributeProperty ?attributeObject 
FROM <http://goo.org/default/Program> 
WHERE { 
    ?id a <http://goo.org/default/Program> . 
   OPTIONAL { 
       { ?id ?attributeProperty ?attributeObject . FILTER(?attributeProperty = <http://goo.org/default/name>)  } 
    } 
    FILTER(?id = <http://example.org/program/Stanford/BioInformatics> || ?id = <http://example.org/program/Stanford/CompSci> || ?id = <http://example.org/program/Stanford/Medicine> || ?id = <http://example.org/program/Southampton/BioInformatics> || ?id = <http://example.org/program/Southampton/CompSci> || ?id = <http://example.org/program/Southampton/Medicine> || ?id = <http://example.org/program/UPM/BioInformatics> || ?id = <http://example.org/program/UPM/CompSci> || ?id = <http://example.org/program/UPM/Medicine>) }

Notes

jonquet commented 11 months ago

Ouahh!

codecov[bot] commented 7 months ago

Codecov Report

Attention: 29 lines in your changes are missing coverage. Please review.

Comparison is base (6187c20) 85.91% compared to head (09c944f) 85.86%.

Files Patch % Lines
lib/goo/sparql/processor.rb 74.39% 21 Missing :warning:
lib/goo.rb 75.00% 3 Missing :warning:
lib/goo/base/resource.rb 83.33% 2 Missing :warning:
lib/goo/sparql/solutions_mapper.rb 95.00% 2 Missing :warning:
lib/goo/sparql/client.rb 92.85% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## development #48 +/- ## =============================================== - Coverage 85.91% 85.86% -0.06% =============================================== Files 37 38 +1 Lines 2669 2702 +33 =============================================== + Hits 2293 2320 +27 - Misses 376 382 +6 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.