semsol / arc2

ARC RDF Classes for PHP
Other
331 stars 92 forks source link

Problem querying PhySH subject headings using SPARQL: not all rows are returned #149

Open nloyola opened 1 year ago

nloyola commented 1 year ago

Arc2 does not work when querying the PhySH RDF file for disciplines. It only returns 12 of the 18 disciplines.

If you go to the PhySH page, you can see that there are 18 concepts listed under Discipline. PhySH provides an RDF file for download at their GitHub page here:

https://github.com/physh-org/PhySH

I have taken the RDF file and made it available over HTTP here:

http://nloyola.asuscomm.com:8000/physh.rdf

I'm using the following script to query the disciplines:

<?php
require 'vendor/autoload.php';

$options = getopt("ld");

$config = array(
    /* db */
    'db_host' => 'localhost',
    'db_name' => 'physh_rdf',
    'db_user' => 'user',
    'db_pwd' => 'secret',

    /* store name (= table prefix) */
    'store_name' => 'physh_store',
);

$store = ARC2::getStore($config);

if (!$store->isSetUp()) {
    $store->setUp();
}

if (array_key_exists('l', $options)) {
    $store->query('LOAD <http://nloyola.asuscomm.com:8000/physh.rdf>');
}

if (array_key_exists('d', $options)) {
    $store->dump();
    exit(0);
}

function queryCheckError($store, $result) {
    if ($store->getErrors()) {
        print_r($store->getErrors());
        exit(0);
    }
    return $result;
}

function getDisciplines($store) {
    $q = '
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX physh: <https://doi.org/10.29172/>
PREFIX physh_rdf: <https://physh.org/rdf/2018/01/01/core#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT *
WHERE {
   ?s ?p physh_rdf:Discipline .
   ?s dcterms:title ?title .
   #?s physh_rdf:prefLabel ?label .
   #?s dcterms:description ?description .
}
';

    return queryCheckError($store, $store->query($q));
}

function showResult($result) {
    $rows = $result['result']['rows'];
    $numRows = count($rows);
    print("rows: {$numRows}\n");

    print(json_encode($rows, JSON_PRETTY_PRINT) . "\n");

    //print_r($result);
    // foreach ($rows as $k => $v) {
    //     print($k . ": " . json_encode($v, JSON_PRETTY_PRINT) . "\n");
    // }
}

$result = getDisciplines($store);
showResult($result);

When I run this script, it only returns 12 rows.

If I use Rasqal RDF Query Library with the same SPARQL query, 19 rows are returned. The extra row corresponds to the root entry I believe.

Note that I'm using MariaDB as my database server. Using PHP 8.1.12 running on Debian 11.

Any help with this issue is greatly appreciated.