vliz-be-opsci / py-trav-harv

python module that will allow an enduser to perform link traversal on a triple store.
0 stars 0 forks source link

TravHarv subjects must be made when task is started , not when config builder is called #48

Open cedricdcc opened 2 months ago

cedricdcc commented 2 months ago

With the following config:

snooze-till-graph-age-minutes: 0
prefix:
  rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
  dcat: <http://www.w3.org/ns/dcat#>
  schema: <https://schema.org/>
  org: <http://www.w3.org/ns/org#>
  dct: <http://purl.org/dc/terms/>
  mi: <http://www.marineinfo.org/ns/ontology#>
assert:
  - subjects:
      literal:
        - http://dev.marineinfo.org/id/collection/947 # WoRMS ackn - direct
    paths:
      - "<http://www.w3.org/ns/dcat#resource> "
  - subjects:
      SPARQL: >
        SELECT DISTINCT ?s
        WHERE {
              [] <http://www.w3.org/ns/dcat#resource> ?s .
              }
    paths:
      - "<https://schema.org/author>"
  - subjects:
      SPARQL: >
        PREFIX schema: <https://schema.org/>
        SELECT DISTINCT ?s
        WHERE {
          ?ok <https://schema.org/author> ?authorid .
          ?authorid <https://schema.org/identifier> ?s .
        }
    paths:
      - "<https://schema.org/affiliation>"
      - "<https://schema.org/givenName>"
      - "<https://schema.org/familyName>"
  - subjects:
      SPARQL: >
        SELECT DISTINCT ?affid
        WHERE {
            ?s <https://schema.org/affiliation> ?affid .
        }
    paths:
      - "<https://schema.org/name>"

Travharv does not dereference the publications from a given dataset. However on the next run it does.

The same issue has been detected for the LWUA where @laurianvm had to rerun the sembench container for the publications to be dereferenced.

cedricdcc commented 2 months ago

When testing in kgap , it was found that all subjects for all tasks are made when config_builder is called and not when tasks are started. This causes many tasks not to have any subjects to dereference.