snikproject / ontology

Public SNIK Ontology. An ontology of information management in hospitals.
https://snikproject.github.io/ontology/
Other
10 stars 1 forks source link

Extreme backup size increase #448

Closed KonradHoeffner closed 3 years ago

KonradHoeffner commented 3 years ago

Something happened between July 1 and August 1 2020:

-rw-r--r-- 1 root root  37M Mär  1  2020 monthly_20200301_backup_virtuoso-data.tar.bz
-rw-r--r-- 1 root root  37M Apr  1  2020 monthly_20200401_backup_virtuoso-data.tar.bz
-rw-r--r-- 1 root root  37M Mai  1  2020 monthly_20200501_backup_virtuoso-data.tar.bz
-rw-r--r-- 1 root root  39M Jun  1  2020 monthly_20200601_backup_virtuoso-data.tar.bz
-rw-r--r-- 1 root root  42M Jul  1  2020 monthly_20200701_backup_virtuoso-data.tar.bz
-rw-r--r-- 1 root root 1,1G Aug  1  2020 monthly_20200801_backup_virtuoso-data.tar.bz
-rw-r--r-- 1 root root 1,1G Sep  1  2020 monthly_20200901_backup_virtuoso-data.tar.bz
-rw-r--r-- 1 root root 1,1G Okt  1 01:58 monthly_20201001_backup_virtuoso-data.tar.bz

However we don't have that many triples:

SELECT (COUNT(*) as ?Triples) ?graph
WHERE 
  { GRAPH ?graph
      { ?s ?p ?o } 
  }

Execute

Triples graph
13069 http://www.snik.eu/ontology/he-unconsolidated
336 http://www.snik.eu/ontology/meta
9860 http://www.snik.eu/ontology/he
1824 http://www.snik.eu/ontology/ciox
249 http://www.snik.eu/ontology/limes-exact
261 http://www.snik.eu/ontology/match
3284 http://hitontology.eu/ontology
1753 http://www.snik.eu/ontology/it4it
26620 http://www.snik.eu/ontology/bb
29397 http://www.snik.eu/ontology/ob
1154 http://www.snik.eu/ontology/persian
13247 http://www.snik.eu/ontology/derived
KonradHoeffner commented 3 years ago

Found the probably source by executing the same query in the conductor, which includes graphs that aren't publicly available. The culprit seems to be http://dbpedia.org with 12571356 triples

Triples graph
450 http://www.w3.org/2002/07/owl
3 http://www.w3.org/ns/ldp#
13069 http://www.snik.eu/ontology/he-unconsolidated
2479 http://www.openlinksw.com/schemas/virtrdf#
863 http://localhost:8890/DAV/
12571356 http://dbpedia.org
160 http://www.w3.org/2002/07/owl#
336 http://www.snik.eu/ontology/meta
9860 http://www.snik.eu/ontology/he
1824 http://www.snik.eu/ontology/ciox
249 http://www.snik.eu/ontology/limes-exact
261 http://www.snik.eu/ontology/match
254 http://www.w3.org/2004/02/skos/core#
161 http://ns.ontowiki.net/SysOnt/
102 http://www.w3.org/1999/02/22-rdf-syntax-ns
87 http://www.w3.org/2000/01/rdf-schema
3284 http://hitontology.eu/ontology
1753 http://www.snik.eu/ontology/it4it
26620 http://www.snik.eu/ontology/bb
866 http://purl.org/dc/terms/MediaType
866 http://purl.org/dc/terms/AgentClass
29397 http://www.snik.eu/ontology/ob
201 http://localhost/OntoWiki/Config/
4466 http://www.snik.eu/ontology/it
1154 http://www.snik.eu/ontology/persian
13247 http://www.snik.eu/ontology/derived

No. of rows in result: 26

Deleted the graph.

KonradHoeffner commented 3 years ago

The file is still 3.6 GB uncompressed in size. Restarting.

KonradHoeffner commented 3 years ago

Restarting didn't help, I will wait a few days to see if it cleans the cache, or whatever reserves the memory, on its own.

SebStaeubert commented 3 years ago

I stopped the daily backup of the SNIK toolset until further notice. The monthly backup will continue to be performed and must be monitored in terms of size.

KonradHoeffner commented 3 years ago

I got an answer in https://stackoverflow.com/questions/66692432/how-to-shrink-virtuoso-db-file-size-after-deleting-a-large-graph/66700096#66700096: Go to the conductor and in Database -> Interactive SQL enter DB.DBA.vacuum ();.

It increased to 4.7 GB now but I expect that to drop.

KonradHoeffner commented 3 years ago

The only way I was able to fix this was:

KonradHoeffner commented 3 years ago

@SebStaeubert: You can enable the daily backups again.

KonradHoeffner commented 3 years ago

Now the OntoWiki does not work correctly anymore, however.

SebStaeubert commented 3 years ago

@SebStaeubert: You can enable the daily backups again.

Backup is enabled again.

KonradHoeffner commented 3 years ago

Now the whole server is not reachable anymore over HTTP (SSH works but not the web pages). But I suspect that has other reasons as other web internal pages also don't work anymore at the moment.

KonradHoeffner commented 3 years ago

@SebStaeubert: You can enable the daily backups again.

Backup is enabled again.

I noticed that the dumps subdirectory of data is backed up as well, which add backups inside backups. Can we ignore this directory for this type of backup?

KonradHoeffner commented 3 years ago

The reason for the service disruption only occurred on my machine and was caused by my network address being mistakenly blacklisted. This has since been fixed and the backup is now smaller, so this issue is finished.