Open JervenBolleman opened 10 years ago
Hi Jerven, Good to meet you at the LDBC TUC meeting. As discussed with Orri we shall implement this feature enhancement for bulk loading of RDF data. I shall notify you when it is available in the open source git repo ...
Hi Jerven, Good to meet you at the LDBC TUC meeting. As discussed with Orri we shall implement this feature enhancement for bulk loading of RDF data. I shall notify you when it is available in the open source git repo ...
Actually in speaking to Orri this morning, this is not a feature enhancement, as the function/procedure already exists he just needs to provide instructions on usage, which he indicated will be provided tomorrow ...
The feature is probably there on the database side. However, it does require a rather significant improvement to the JDBC drivers. As currently the JDBC connection method createArrayOf is not implemented. i.e. from the java side it will be really hard to use. see libsrc/JDBCDriverType4/virtuoso/jdbc2
public Array createArrayOf(String typeName, Object[] elements) throws SQLException
{
throw new VirtuosoFNSException ("createArrayOf(typeName, elements) not supported", VirtuosoException.NOTIMPLEMENTED);
}
OK, I think Orri was assuming you would call the Virtuoso server side procedure directly, but if Java has a createArrayOf method for this already then we use it for implementation ...
The main issue is getting the data from the java side via the driver into virtuoso. Most of the LOB or Array methods that one would normally use are not reachable for pure JDBC code.
Currently the RDF bulk load operations in virtuoso are Turtle string based. i.e. a java RDF model is serialised into a turtle string. This turtle string is then parsed inside the database using the TTLP function to load the data in to the rdf_quad and rdf_obj tables.
I suggest that instead of sending a String to be parsed we send 4 arrays (or 5) instead. The first array is an array of subjects (uri/bnodes) second predicates (uri) third uri/bnode objects fourth literal objects (may be merged with third) fifth bnode/uri for graph context.
Being able to send such a structured format to the database avoid not just parsing, but also gives the possibility for vectored loading. Each of these arrays of values can be replaced by rdf_obj ids in parallel. This allows you to build up a page for the rdf_quad table. In general avoiding the serial CPU load off parsing the turtle string.