pchampin / sophia_rs

Sophia: a Rust toolkit for RDF and Linked Data
Other
210 stars 23 forks source link

Loading graph limited by single CPU core, parallelize? #119

Open KonradHoeffner opened 1 year ago

KonradHoeffner commented 1 year ago

Loading 16 million triples from a 3.6 GB N-Triples file takes 40 seconds with a LightGraph with a single CPU core maxed throughout on an Intel Core i9 12900k with 24 threads. Is it possible to parallelize this somehow? Given that N-Triples files can be arbitrarily split, they could be partitioned into n blocks, which are then loaded in parallel. Or the SPO wrapper could be initialized in parallel to the OPS wrapper when using a FastGraph.

pchampin commented 1 year ago

Re. LightGraph, there is not much that can be parallelized from my point of view (I may be wrong). One thing that could be explored would be to not block on IO, but rather do the indexing during IO latency (e.g. using async code). As I understand @Tpt is working on a new parser infrastructure that would make this possible.

Re. FastGraph, yes, the creation of all indexes could indeed be parallelized.