Closed mhabsaoui closed 6 years ago
If you want to do a live backup (i.e. while Fuseki is still running) I think you should use the SPARQL HTTP Graph Store API, i.e. perform a GET on the dataset or a specific graph.
@osma
If you want to do a live backup (i.e. while Fuseki is still running)
=> Yes, if possible so that the skosmos site can still be accessed by users... But if we have to pause/stop the Fuseki server to perform backup/restore, no problem with that.
you should use the SPARQL HTTP Graph Store API, i.e. perform a GET on the dataset or a specific graph.
=> I already tried the SOH scripts in $FUSEKI_HOME/bin :
s-post http://localhost:3030/$/backup/LoterreVocabularies
to mimic the admin backup functionality (backup button) from admin interface, but Required: dataset URI, graph URI (or 'default') and file :/
s-get <URI> <Graph>
to get the result in (.ttl ? format) plain text. But we don't get any backup archive (e.g. *.nq.gz file) and it seems not convenient to perform a bulkload to TDB :/
@stain @blankdots I also tried to create new bash script 'tdbbackup' (in opposite of yours named 'tdbloader') to execute a java -cp $FUSEKI_HOME/fuseki-server.jar tdb.tdbdump --compress --loc=$FUSEKI_BASE/databases/LoterreVocabularies
command inside the container, but got same exception. Any idea on testing this on your side ?
So I really think the tdbbackup / tdbdump and tdbloader CLI commands are appropriate to what we want, only stuck with this "TDBException locked by the process with PID 1" prevention from JVM. And even if I try to stop the fuseki server (java process), we can no more do anything. :/
Thanks for your feedbacks...
You should be able to do a HTTP GET on the dataset URI to get n-quads for the whole dataset. I.e. if your SPARQL HTTP Graph Store endpoint is http://localhost:3030/ds/data, try:
$ wget --header "Accept: application/n-quads" http://localhost:3030/ds/data
Sure you can use the TDB command line utilities as well, but then you need to shut down Fuseki while using them. Two processes cannot access the same TDB directory at the same time. I don't understand why shutting down Fuseki didn't allow you to use the TDB command line utilities, maybe it didn't shut down cleanly and left behind a stale lock file.
PS. It's probably best to ask questions like this on the Jena users mailing list instead of here. AFAICT there is nothing docker-specific in this issue.
You should be able to do a HTTP GET on the dataset URI to get n-quads for the whole dataset
I get a 'data' file with no extension. Tried to put/post/upload it but seems not accepted or needing a graph uri...
I don't understand why shutting down Fuseki didn't allow you to use the TDB command line utilities, maybe it didn't shut down cleanly and left behind a stale lock file.
How would you stop/pause the already running Fuseki Server (file "/jena-fuseki/fuseki-server") ? Here the processes running inside the docker container : if I kill the process PID 1, in fact it stops the container Life-Cycle and can do no more...
And using the already running java process to pass to it the wanted CLI command, it says it requires already defined
flags...
I see, if the Fuseki process is coupled to the container then indeed it can be hard to kill without losing the whole container. That rules out using the TDB command line tools AFAICT.
I think you should be able to perform a HTTP PUT on the same URL where you got the n-quads dump, and it will replace the old data with the provided quads. Not with s-put but with a generic HTTP client such as curl or wget.
Wheras the dedicated CLI command is ready for as you can see
I think you should be able to perform a HTTP PUT on the same URL where you got the n-quads dump, and it will replace the old data with the provided quads. Not with s-put but with a generic HTTP client such as curl or wget.
Ok i'l give it a try ;)
As I said above, two processes cannot access the same TDB directory at the same time. If you cannot shut down Fuseki without taking down the whole container (including the TDB files), then the command line utilities are useless.
I think you should be able to perform a HTTP PUT on the same URL where you got the n-quads dump, and it will replace the old data with the provided quads. Not with s-put but with a generic HTTP client such as curl or wget.
Ok i'l give it a try ;) => Nice done π As you can see HTTP put did the trick. After that, the vocs were on the right dataset.
`curl -i -X PUT -H "Content-Type: application/n-quads" -d @"$PWD/data" http://localhost:3030/LoterreVocabularies/ HTTP/1.1 100 Continue
HTTP/1.1 200 OK Date: Fri, 02 Feb 2018 13:58:27 GMT Fuseki-Request-ID: 75 Content-Type: application/json;charset=utf-8 Transfer-Encoding: chunked
{ "count" : 33032 , "tripleCount" : 0 , "quadCount" : 33032 }`
I am just wondering about the Graph binding for vocs, got check it and tell back.
@osma I can confirm that it works Nice as we get restored all the dataset that were backed, with all vocs belonging to their right graphs π
I am wondering though how the tdbdump OR tdbbackup / tdbloader CLI commands have to be used on a running jena-fuseki server => need to go on the Jena users mailing lists
https://issues.apache.org/jira/projects/JENA/issues/JENA-1419?filter=allopenissues
Thanks.
As a feedback, I can tell that in fact it seems that the tdbdump OR tdbbackup / tdbloader CLI commands have to be used on a stopped jena-fuseki server !
Indeed, these tools are dedicated to convert all the file contents found in the $FUSEKI_BASE/databases/ds/data
directory into a one binary file containing the triplestore data in nquads format.
I have tested them (with the Fuseki docker container down) on the persisted TDB directory. It works well ;-)
Thanks for the feedback, @mhabsaoui - indeed you will need to shut down the jena-fuseki server and access the data volume separately to do tdbbackup; alternatively do the backup using the UI "Backup" button or the REST API.
You can in theory run tdbloader while it is live, but only to a new database which you later "create" in the UI, as explained in the Docker readme.
As a feedback, I can tell that in fact it seems that the tdbdump OR tdbbackup / tdbloader CLI commands have to be used on a stopped jena-fuseki server ! β¦ I have tested them (with the Fuseki docker container down) on the persisted TDB directory. It works well ;-)
Can you (@mhabsaoui) please give a command line example β it is not clear to me how to (1) stop the jena-fuseki server (2) without stopping the docker container β I tried to terminate inside the docker container the command responsable for running the fuseki server, but then the whole container stopped and kicked me out.
Or is it to implement a fuseki.service
first and then stop that service inside the container?
Thank you
i found that testing for the tdb.lock actually solved my issues
fuseki:
restart: unless-stopped
ports:
- "3030:3030"
stop_grace_period: 10s
healthcheck:
test:
- "CMD-SHELL"
- "wget -qO /dev/null http://localhost:3030/$$/ping"
- "test ! -f /system/tdb.lock"
interval: 10s
timeout: 5s
retries: 3
Hi,
First thanks for your contribution on jena-fuseki Dockerfile π
I am trying to understand the usage of TDB.dump/loader command line utility to create backups and restoring them properly.
Backups:
bash-4.3# java -cp $FUSEKI_HOME/fuseki-server.jar tdb.tdbdump --compress --loc=$FUSEKI_BASE/databases/LoterreVocabularies
=> Is my command correct (I've seen a twin command in the javadoc named 'tdbbackup', same result)? => As you can see I got this JVM lock exception already talked in the FAQs. But there is no other explanation on how to proceed to avoid it and to run tdbdump properly, and the docs are poor on it... So any suggestion (already tried with the running 'fuseki-server' script) ?
Restores
I think the right command of restoring gziped nquads dumps with tdbloader is :
bash-4.3# java -cp $FUSEKI_HOME/fuseki-server.jar tdb.tdbloader --loc=$FUSEKI_BASE/databases/LoterreVocabularies $FUSEKI_BASE/backups/LoterreVocabularies_BKP.nq.gz
=> I got the same TDBException about
Thanks for help.