trellis-ldp / trellis

Trellis is a platform for building scalable Linked Data applications
https://www.trellisldp.org
Apache License 2.0
105 stars 21 forks source link

Error storing audit dataset in remote triplestore #559

Closed lgleim closed 5 years ago

lgleim commented 5 years ago

I am encountering errors when trying to use trellis (current master as in the develop docker image) with a remote SPARQL endpoint. The record itself is correctly updated but the creation of the corresponding Audit Log entry fails with the following error being logged and HTTP error code 500 being returned.

ERROR [2019-10-22 08:32:46,495] org.trellisldp.http.TrellisHttpResource: Error:
! org.apache.jena.atlas.web.HttpException: 400 - Bad Request
! at org.apache.jena.riot.web.HttpOp.exec(HttpOp.java:1093)
! at org.apache.jena.riot.web.HttpOp.execHttpPost(HttpOp.java:721)
! at org.apache.jena.riot.web.HttpOp.execHttpPost(HttpOp.java:665)
! at org.apache.jena.rdfconnection.RDFConnectionRemote.lambda$doPutPostDataset$9(RDFConnectionRemote.java:438)
! at org.apache.jena.rdfconnection.RDFConnectionRemote.exec(RDFConnectionRemote.java:518)
! at org.apache.jena.rdfconnection.RDFConnectionRemote.doPutPostDataset(RDFConnectionRemote.java:433)
! at org.apache.jena.rdfconnection.RDFConnectionRemote.loadDataset(RDFConnectionRemote.java:391)
! at org.trellisldp.triplestore.TriplestoreResourceService.lambda$null$20(TriplestoreResourceService.java:405)
! at org.apache.jena.system.Txn.exec(Txn.java:77)
! at org.apache.jena.system.Txn.executeWrite(Txn.java:125)
! at org.trellisldp.triplestore.TriplestoreResourceService.lambda$add$21(TriplestoreResourceService.java:405)
! ... 6 common frames omitted
! Causing: org.trellisldp.api.RuntimeTrellisException: Error storing audit dataset for <trellis:data/test>
! at org.trellisldp.triplestore.TriplestoreResourceService.lambda$add$21(TriplestoreResourceService.java:407)
! at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
! ... 5 common frames omitted
! Causing: java.util.concurrent.CompletionException: org.trellisldp.api.RuntimeTrellisException: Error storing audit dataset for <trellis:data/test>
! at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
! at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
! at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1629)
! at java.util.concurrent.CompletableFuture$AsyncRun.exec(CompletableFuture.java:1618)
! at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
! at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
! at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
! at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

My config.yml is as follows:

server:
  applicationConnectors:
    - type: http
      port: 8080
  requestLog:
    appenders:
      - type: file
        currentLogFilename: /opt/trellis/log/access.log
        archive: true
        archivedLogFilenamePattern: /opt/trellis/log/access-%i.log
        archivedFileCount: 5
        maxFileSize: 100K

logging:
  level: WARN
  appenders:
    - type: file
      currentLogFilename: /opt/trellis/log/trellis.log
      archive: true
      archivedLogFilenamePattern: /opt/trellis/log/trellis-%i.log
      archivedFileCount: 5
      maxFileSize: 100K
  loggers:
    org.trellisldp: INFO
    io.dropwizard: INFO

# This may refer to a remote Triplestore, e.g. https://example.org/sparql
#resources: /opt/trellis/data/rdf
resources: http://db:9999/blazegraph/namespace/kb/sparql

binaries: /opt/trellis/data/binaries

mementos: /opt/trellis/data/mementos

namespaces: /opt/trellis/data/namespaces.json

# This may refer to a static base URL for resources. If left empty, the
# base URL will reflect the Host header in the request.
baseUrl:

# This configuration will enable a WebSub "hub" header.
hubUrl:

auth:
    adminUsers: []
    webac:
        enabled: true
    jwt:
        enabled: false
        key: changeme
    basic:
        enabled: true
        usersFile: /opt/trellis/etc/users.auth

cors:
    enabled: true
    allowOrigin:
        - "*"
    maxAge: 180

cache:
    maxAge: 86400
    mustRevalidate: true

notifications:
    enabled: false
    type: JMS
    topicName: "trellis"
    connectionString: "tcp://localhost:61616"

# JSON-LD configuration
jsonld:
    cacheSize: 10
    cacheExpireHours: 24
    contextWhitelist: []
    contextDomainWhitelist: []

The corresponding remote triplestore is backed by Blazegraph 2.1.5.

The configuration should be replicatable using the following docker-compose.yml and config.yml at path trellis/config/config.yml:

version: "3.1"
services:
  trellis:
    image: trellisldp/trellis:develop
    environment:
      JAVA_OPTS: "-Xms250m -Xmx1024m"
    ports:
      - 80:8080
    depends_on:
      - db
    volumes:
      #  file-based resources (e.g. binaries and mementos), application logs and configuration files
      - ./trellis/data:/opt/trellis/data
      - ./trellis/log:/opt/trellis/log
      # Please see note below about the ./etc directory
      - ./trellis/config:/opt/trellis/etc
  db:
    # Blazegraph Triplestore
    # Endpoint at http://db:9999/blazegraph/namespace/kb/sparql
    # Docs at https://github.com/nawerprod/blazegraph
    image: nawer/blazegraph:2.1.5
    environment:
      JAVA_XMS: 512m
      JAVA_XMX: 1g
    volumes:
      - ./blazegraph/data:/var/lib/blazegraph
    ports:
      - "9999:9999"
lgleim commented 5 years ago

I created a repo to reproduce the setup:

git clone https://git.rwth-aachen.de/lars.gleim/trellisldp.git
cd trellisldp
docker-compose up
ajs6f commented 5 years ago

That stacktrace tells us that Trellis received an error from Blazegraph. Can you show us what Blazegraph recorded for that request? Presumably it can log a reason for returning a 400 Bad Request response and that would be the crucial information.

acoburn commented 5 years ago

My suspicion is that Blazegraph does not support the named graph format used for the audit log. If that is the case, I would suspect that updates to ACL resources would also not work. And if that is the case, there is also a straight-forward work-around on the code side.

lgleim commented 5 years ago

Thanks for the pointer!

After more experimentation, I can confirm that the setup works with both Blazegraph namespaces in QUAD mode and Fuseki triplestores.

It might be helpful to add this information to the documentation in the wiki, though!

acoburn commented 5 years ago

@lgleim thanks for checking into this. I added a note to the wiki under Configuring Trellis Applications, but if you have suggestions for making that more prominent, I would be open to suggestions.