semagrow / docker-semagrow-cassandra

This docker extends docker-semagrow with connector-cassandra
0 stars 2 forks source link

Functionality Verification #1

Closed gioargyr closed 8 years ago

gioargyr commented 8 years ago

Hi, included the docker semagrow/semagrow-cassandra in the docker-compose.yml (https://github.com/big-data-europe/pilot-sc7-change-detector/blob/master/docker-compose.yml) of SC7-pilot. It will federate cassandra(already running) and strabon. It is configured through the metadata.ttl at pilot-sc7-change-detector/config/semagrow/. After running it, how can I verify its functionality?

antru6 commented 8 years ago

Hi, to check if Semagrow federates Strabon, you could issue an ASK query with a known triple in the Strabon dataset. To verify that Semagrow federates Cassandra, you could issue a SPARQL mapping of a simple CQL query. For example, if in your Cassandra you have a table "events" on a keyspace "bde", with primary key the column "event_id" you could issue a query of the form

SELECT * WHERE {
  ?s <http://cassandra.semagrow.eu/bde/events#event_id> ?id
} LIMIT 1
gioargyr commented 8 years ago

Thanks for the answer, but as it seems we are more concerned about the semagrow-strabon part. Can you be more specific as you were with Cassandra? Can you please give me an example of an ASK query and instructions how to execute it given that the dockers semagrow and strabon are running and I already have a known triple in the Strabon dataset?

antru6 commented 8 years ago

There are many ways to check this, since both semagrow and strabon are sparql endpoints. For example, lets say that you have this triple <http://ex.a/1> <http://ex.a/2> <http://ex.a/3>. in your strabon store. You could issue this query SELECT * WHERE { ?s <http://ex.a/2> <http://ex.a/3> . } and then verify that the uri <http://ex.a/1> is contained in the result set.

gioargyr commented 8 years ago

Hi! I did what you suggested but I didn't get any results. Also I run SELECT * WHERE {?s ?p ?o} and I got multiple results where some of my strabon's data were there, but not all. Attached, you can see where I entered the commands in each case and the returned results. Please, tell me if I do anything wrong, or if I have bad connection between strabon and semagrow. semagrow01 semagrow02 strabon

gioargyr commented 8 years ago

Re-open falsely closed issue

antru6 commented 8 years ago

Please attach your metadata.ttl file and paste the output you get after pressing the Decompose button for the first query.

gioargyr commented 8 years ago

Here's the result after pressing Decompose for the 1st query. fyi: I changed the .ttl to .txt because github doesn't let me to upload it. metadata.txt semagrow-decompose

QueryRoot Projection ProjectionElemList ProjectionElem "s" Plan(cost=[250.0,0]) SourceQuery (source = eu.semagrow.core.impl.sparql.SPARQLSite@7272f14) Plan(cost=[3.0,0]) StatementPattern Var (name=s)

              Var (name=-const-http://www.w3.org/1999/02/22-rdf-syntax-ns#type-uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
              Var (name=-const-http://www.opengis.net/ont/geosparql#Geometry-uri, value=http://www.opengis.net/ont/geosparql#Geometry, anonymous)
gmouchakis commented 8 years ago

@gioargyr I'm having trouble to reproduce that behaviour. To test it I setup Semagrow using the provided metadata.ttl and issued the provided test query

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE {
  ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.opengis.net/ont/geosparql#Geometry>
}

Semagrow returned three results listed bellow http://big-data-europe.eu/security/man-made-changes/Geometry/id/5 http://big-data-europe.eu/security/man-made-changes/Geometry/id/6 http://big-data-europe.eu/security/man-made-changes/Geometry/id/1

In the results you get from your ?s ?p ?o query I can see some standard Virtuoso triples. In the metadata.ttl file provided I don't see a Virtuoso endpoint in the federation. Please verify that you are using the correct Semagrow endpoint and you have mounted a directory containing a file named metadata.ttl (not metadata.txt) to /etc/default/semagrow when you started the container.

If you are using the correct endpoint please provide the IP (cluster internal or public) so I can test your deployment.

gioargyr commented 8 years ago

As I can see, the metadata.ttl is loaded correctly. But I can't understand the "I don't see a Virtuoso endpoint in the federation" thing. About semagrow, I am using the semagrow/semagrow-cassandra docker if this is what you ask. Also you can find the compose I am running right now in master's /home/iitadmin/pilot-sc7-change-detector directory. All the services are running right now and (my) semagrow is in 10.0.10.12:8192

gmouchakis commented 8 years ago

@gioargyr In the compose file Semagrow is constrained to run on slave2 and mount the directory pilot-sc7-change-detector/config/semagrow. When I go into that directory on slave2 I find a metadata.ttl file that is different from the metadata.txt file you attached (see contents bellow). Please replace that file with the metadata.ttl file you want to use and restart Semagrow.

iitadmin@slave2:~/pilot-sc7-change-detector/config/semagrow$ cat metadata.ttl 
@prefix void: <http://rdfs.org/ns/void#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . 
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

_:DatasetRoot rdf:type void:Dataset .

_:Dataset1
     rdf:type void:Dataset ; 
     void:sparqlEndpoint <http://dbpedia.org/sparql> ; 
     void:triples 10000 ; 
     void:distinctSubjects 200 ; 
     void:distinctObjects  1000 ; 
     void:properties 5 ;
     void:propertyPartition [ 
          void:property <http://localhost/my> ; 
          void:triples 100 ; 
          void:distinctSubjects 10 ; 
          void:distinctObjects 10 ] ; 
     void:propertyPartition [ 
          void:property <http://rdf.iit.demokritos.gr/2014/my#pred> ; 
          void:triples 100 ; 
          void:distinctSubjects 5 ] ;
     void:subset _:DatasetRoot .
gioargyr commented 8 years ago

Hi, finally the .ttl was the problem. I did a lot of tests regarding SemaGrow -> Strabon (+PostGIS) these days and I ended up making the same directory in every node in the cluster putting the correct sub-directories and files in it and now SemaGrow works perfectly everywhere. Thanks a lot for your help.