semsol / arc2

ARC RDF Classes for PHP
Other
332 stars 89 forks source link

Any reason SELECTing on a predicate wouldn't be working? #133

Closed craigdietrich closed 1 year ago

craigdietrich commented 4 years ago

This seems so trivial I'm nervous to ask! But I can't seem to get selecting via a predicate to work.

// Returns results
$q = 'PREFIX dcterms:  <http://purl.org/dc/terms/> .
  SELECT *
  WHERE {
    ?s ?p "34.145369444444,-118.40721111111" .
  }';
// Returns no results and no errors
$q = 'PREFIX dcterms:  <http://purl.org/dc/terms/> .
  SELECT *
  WHERE {
    ?s <http://purl.org/dc/terms/spatial> ?o .
  }';
// ... nor ...
$q = 'PREFIX dcterms:  <http://purl.org/dc/terms/> .
  SELECT *
  WHERE {
    ?s dcterms:spatial ?o .
  }';
// Also tried this just to be sure, returns no results or errors
$q = 'PREFIX dcterms:  <http://purl.org/dc/terms/> .
  SELECT *
  WHERE {
    ?s dcterms:spatial "34.145369444444,-118.40721111111" .
  }';

Thank you!

k00ni commented 4 years ago

Hi @craigdietrich, can you provide us a working exploit of the problem? A minimal graph + the code you use to query the database would be helpful.

craigdietrich commented 4 years ago

Thanks for your quick reply, @k00ni! I'll put together an example and get back to you soon.

craigdietrich commented 4 years ago

Following up on this, I think I figured out the problem. All of our ARC2 databases were created a number of year ago, even through we've updated the libraries along the way (currently at the most recent release).

It looks like in some of the databases, the id2val table has thousands of rows -- each time a triple is entered it appears that a new entry is added for the predicate even if that predicate already exists

In some of the databases, the number of rows in the id2val table is, say, 100 -- each time a triple is added it re-uses the predicate if it exists.

For the former, I'm not able to query on the predicate because the temp table query is asking for a specific predicate ID, even though there might be hundreds of IDs for a given predicate

For the latter, it works a-okay because there's only one ID per predicate.

I suppose at this stage we can work around this (we only have one client hoping to take advantage of the work we're doing that needs to query on predicates), and I think their DB is the latter. Though, I'm curious if anyone has thoughts on why the former databases had that runaway predicate creation? Curious your thoughts?

k00ni commented 4 years ago

I observed similar behavior when i worked on version 2.4 but can't help you with an explanation.

Maybe the following can help here:

  1. export all triples,
  2. setup the database again using our latest database schema (preferably InnoDb instead of MyISAM) and
  3. import all triples again.

That might not fix possible buggy behavior, but at least the DB schema is stable and ruled out as the source of the problem.

craigdietrich commented 4 years ago

@k00ni I was thinking the same thing!

I'm more and more thinking that this issue is related to my other, https://github.com/semsol/arc2/issues/135 , because the false positive when asking if the val_hash exists would be impactful on id2val, if id2val was relying on val_hash to see if existing predicates existed.

Is there an easy way ti export all triples and import (ie, a built-in way) that you know of? Otherwise, I suppose I'll write a little tool to do it.

Thanks!

craigdietrich commented 4 years ago

Alrighty, this is definitely related to #135 .. I found what I think is another error that was keeping existing predicates from being pulled from id2val.

I'll get both those changes set up in the update fork you created so it can be vetted.

(I still need to figure out a mini tool to export / import the triples, but, I think this should probably be easy enough.)