kg-construct / best-practices

1 stars 0 forks source link

Support batching/chunking #3

Open dachafra opened 4 years ago

dachafra commented 4 years ago

issue: Have you tested your tool on 10M rows? Do you build some triples model in memory before output? That's not scalable.

suggestion:

  1. The tool must support streaming, i.e. output each triples block when ready, not gathering triples in memory and dumping them at the very end
  2. Further, using intermediate storage (eg rdf4j In-memory or Native) is slow (eg 30x slower than direct output)
dachafra commented 2 years ago

This is already available in tools such as Morph-KGC, RMLStreamer, or SDM-RDFizer. I think it's out of the scope of the community group atm. If there are no concerns I'll close it...