openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
870 stars 210 forks source link

I am loading freebase 2M by rdf_loader_run, why loaded 80000000 rdfs and still loading? #903

Open zzks opened 4 years ago

zzks commented 4 years ago

i think there are 2000000 rdfs in freebase 2M, i load it by: ld_dir('/home/ubuntu/dataset2', 'freebase-FB2M.txt', 'http://www.freebase.com'); rdf_loader_run(); after 5 hours, it still runing, and loaded callret-0 84887297 checked by SELECT COUNT(*) { ?s ?p ?o } so many rdfs in freebase-FB2M.txt?

HughWilliams commented 4 years ago

Has RDF Performance Tuning been performed on your Virtuoso instance as details in the linked guide ?

What does the output of running the command "status();" from the Virtuoso "isql" command line tool report for the current state of the load ?

The SPARQL count performed shows about 84887297 having been loaded ( FROM http://www.freebase.com to query to get the exact amount), which is not a lot of triples to have been loaded in a 5 hour period, hence the interest in knowing if the instance has been performance tuned.

What is the size of the freebase-FB2M.txt file ? You seem to think it contains 2million triples based on its name, but 84million have been loaded already. Where did you obtain the dataset file from as I cannot find it online. The last Freebase dataset I am aware of contained about 1.9 billion triples.