rdfjs / N3.js

Lightning fast, spec-compatible, streaming RDF for JavaScript
http://rdf.js.org/N3.js/
Other
712 stars 130 forks source link

Is there a need/desire to Extract/Encapsulate the Details of the Three-Layered Index used in N3Store? #267

Open dhurlburtusa opened 2 years ago

dhurlburtusa commented 2 years ago

So, N3Store uses a three-layered index to be "lightning fast". I was curious what was involved in how a three-layered index was implemented, and I was thinking it would be useful to be able to use the index without using an N3Store. (In some of the code I maintain, I simply use an N3Store instance basically as an index and I don't necessarily need the extra methods not related to index functionality.)

So, I studied the code and was able to isolate a JavaScript class representing the internals of the index. I refactored the N3Store to use this new index class (I called it CounterBasedGraphsIndex but maybe a different name like ThreeLayeredRdfJsTermsIndex would be better/more descriptive). The change moved about 500 lines of index specific code out of the N3Store. The resulting N3Store implementation is a little more than 300 lines (excluding the lines for the DatasetCoreAndReadableStream class embedded in the N3Store.js module) from over 800.

This refactoring opens up the possibility to easily pick different index implementations that an N3Store uses. This could be a configuration option to the N3Store's constructor.

Because the new refactored implementation now delegates most of its work down to the index instance, I was concerned what the performance impact would be.

I ran the performance tests three times before making the changes. Then after refactoring, I ran the tests three more times. There doesn't appear to be any noticeable difference.

Here are the results:

// =============================================================================
// Before:
// =============================================================================

N3Store performance test
- Adding 16777216 triples to the default graph: 22.659s
* Memory usage for triples: 802MB
- Finding all 16777216 triples in the default graph 65536 times (0 variables): 1:38.829 (m:ss.mmm)
- Finding all 16777216 triples in the default graph 131072 times (1 variable): 13.066s
- Finding all 16777216 triples in the default graph 196608 times (2 variables): 13.127s

- Adding 16777216 quads: 32.770s
* Memory usage for quads: 775MB
- Finding all 16777216 quads 1048576 times: 47.811s

// =============================================================================

N3Store performance test
- Adding 16777216 triples to the default graph: 22.315s
* Memory usage for triples: 802MB
- Finding all 16777216 triples in the default graph 65536 times (0 variables): 1:38.796 (m:ss.mmm)
- Finding all 16777216 triples in the default graph 131072 times (1 variable): 12.979s
- Finding all 16777216 triples in the default graph 196608 times (2 variables): 13.213s

- Adding 16777216 quads: 30.257s
* Memory usage for quads: 705MB
- Finding all 16777216 quads 1048576 times: 45.865s

// =============================================================================

N3Store performance test
- Adding 16777216 triples to the default graph: 21.787s
* Memory usage for triples: 801MB
- Finding all 16777216 triples in the default graph 65536 times (0 variables): 1:36.272 (m:ss.mmm)
- Finding all 16777216 triples in the default graph 131072 times (1 variable): 12.694s
- Finding all 16777216 triples in the default graph 196608 times (2 variables): 12.579s

- Adding 16777216 quads: 31.576s
* Memory usage for quads: 705MB
- Finding all 16777216 quads 1048576 times: 46.105s

// =============================================================================
// After:
// =============================================================================

N3Store performance test
- Adding 16777216 triples to the default graph: 22.664s
* Memory usage for triples: 802MB
- Finding all 16777216 triples in the default graph 65536 times (0 variables): 1:40.410 (m:ss.mmm)
- Finding all 16777216 triples in the default graph 131072 times (1 variable): 13.164s
- Finding all 16777216 triples in the default graph 196608 times (2 variables): 13.013s

- Adding 16777216 quads: 30.485s
* Memory usage for quads: 705MB
- Finding all 16777216 quads 1048576 times: 46.852s

// =============================================================================

N3Store performance test
- Adding 16777216 triples to the default graph: 20.979s
* Memory usage for triples: 801MB
- Finding all 16777216 triples in the default graph 65536 times (0 variables): 1:33.853 (m:ss.mmm)
- Finding all 16777216 triples in the default graph 131072 times (1 variable): 12.510s
- Finding all 16777216 triples in the default graph 196608 times (2 variables): 12.535s

- Adding 16777216 quads: 28.797s
* Memory usage for quads: 698MB
- Finding all 16777216 quads 1048576 times: 45.223s

// =============================================================================

N3Store performance test
- Adding 16777216 triples to the default graph: 21.640s
* Memory usage for triples: 802MB
- Finding all 16777216 triples in the default graph 65536 times (0 variables): 1:35.893 (m:ss.mmm)
- Finding all 16777216 triples in the default graph 131072 times (1 variable): 12.650s
- Finding all 16777216 triples in the default graph 196608 times (2 variables): 12.656s

- Adding 16777216 quads: 30.705s
* Memory usage for quads: 809MB
- Finding all 16777216 quads 1048576 times: 45.085s

The tests still pass with 100% coverage.

I'll submit a PR if anyone is interested in this.

RubenVerborgh commented 2 years ago

Ace, @dhurlburtusa! Please make a PR indeed. Shall we name it N3Index?

jeswr commented 1 month ago

Partial support added in https://github.com/rdfjs/N3.js/pull/394 with a separate entity index (but not a separate 3-layered index).