rdfjs / N3.js

Lightning fast, spec-compatible, streaming RDF for JavaScript
http://rdf.js.org/N3.js/
Other
676 stars 127 forks source link

Feat/reasoning fwd chain #293

Closed jeswr closed 2 years ago

jeswr commented 2 years ago

Supercedes #291 #292

This is a WIP PR to add RL reasoning support to N3.js. Code still needs a significant cleanup - and a smarter forward chaining algorithm is yet to be implemented.

Part of my reasoning for doing this is to see what the performance difference between reasoning via something like https://github.com/comunica/comunica-feature-reasoning/ vs. reasoning directly using the store internals is.

Currently, this implements a naive algorithm of repeatedly applying rules until no new results are available.

Currently, this can apply RDFS rules over the union of timbl's profile card, and the foaf ontology (~1_000 quads) in 20ms (DELL XPS 15, 32GB), generating ~700 quads. For reference it takes ~27ms to load the data.

I expect that once I implement smarter forwards chaining this will be <10ms

@pbonte @josd thought you might be interested in this :)

pbonte commented 2 years ago

Hi @jeswr! Very interesting! You can look at the so-called semi-naive algorithm to see if you can still speed it some more. The idea there is that you start from an empty dataset and add facts one at a time, typically resulting in less duplicates being computed. Maybe this can get you below the 10ms! I see that your duplicate results already have decreased a lot already compared to your previous implementation, so I was wondering what optimization you did to achieve this, as from the text it seems like you are using the same approach, i.e. iterating over all the rules until no new ones can be inferred?

Very nice results!

jeswr commented 2 years ago

Thanks @pbonte - I've already got the semi-naive algorithm in Comunica. I'm still in the process of trying to figure out the best way of implementing it here since in general it will require me to generate more 'copies' of the rule since part of the way I am achieving the results I am here is by mutating the variables in the rule (all variables in the rules used internally are just a pointer to a location in memory where we can place the variable).

Perhaps rather than making more copies I can create a 'buffer' of new data and the rules that need to be applied to it (which is implicitly happening in the implementation of Comunica due to the way I am using the iterators over there).

I see that your duplicate results already have decreased a lot already compared to your previous implementation

The previous PR I made had a bug in it where I was looking over the incorrect index in some cases resulting in the creation of data that should not have been there.

josd commented 2 years ago

That is interesting news and I am testing it right now for the Deep Taxonomy Benchmark So we run Reasoning-perf.js for different depths

http://eulersharp.sourceforge.net/2009/12dtb/test-dl-10.n3
http://eulersharp.sourceforge.net/2009/12dtb/test-dl-100.n3
http://eulersharp.sourceforge.net/2009/12dtb/test-dl-1000.n3
http://eulersharp.sourceforge.net/2009/12dtb/test-dl-10000.n3
...

and check for

  console.log(store.has(
    new Quad(
      new NamedNode('http://eulersharp.sourceforge.net/2009/12dtb/test#ind'),
      new NamedNode('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
      new NamedNode('http://eulersharp.sourceforge.net/2009/12dtb/test#A2'),
    ),
  ))

The results for http://eulersharp.sourceforge.net/2009/12dtb/test-dl-10.n3 are

$ node Reasoning-perf.js
loading deep taxonomy: 6.996ms
32
apply reasoning: 6.859ms
true
247

The results for http://eulersharp.sourceforge.net/2009/12dtb/test-dl-100.n3 are

$ node Reasoning-perf.js
loading deep taxonomy: 13.919ms
302
apply reasoning: 144.044ms
true
15862

The results for http://eulersharp.sourceforge.net/2009/12dtb/test-dl-1000.n3 are

$ node Reasoning-perf.js
loading deep taxonomy: 82.991ms
3002
apply reasoning: 33.792s
true
1508512

The results for http://eulersharp.sourceforge.net/2009/12dtb/test-dl-10000.n3 are

$ node Reasoning-perf.js
loading deep taxonomy: 312.01ms
30002

<--- Last few GCs --->

[9906:0x5e18770]    27948 ms: Scavenge 2045.9 (2083.2) -> 2045.7 (2085.7) MB, 5.9 / 0.0 ms  (average mu = 0.348, current mu = 0.318) allocation failure
[9906:0x5e18770]    28406 ms: Mark-sweep (reduce) 2047.7 (2086.0) -> 2046.1 (2083.7) MB, 202.7 / 0.0 ms  (+ 140.1 ms in 27 steps since start of marking, biggest step 6.2 ms, walltime since start of marking 360 ms) (average mu = 0.316, current mu = 0.280)

<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb09980 node::Abort() [node]
 2: 0xa1c235 node::FatalError(char const*, char const*) [node]
 3: 0xcf77be v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xcf7b37 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xeaf3d5  [node]
 6: 0xeafeb6  [node]
 7: 0xebe3de  [node]
 8: 0xebee20 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 9: 0xec1d9e v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
10: 0xe83012 v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [node]
11: 0xe7d92c v8::internal::FactoryBase<v8::internal::Factory>::AllocateRawArray(int, v8::internal::AllocationType) [node]
12: 0xe7da05 v8::internal::FactoryBase<v8::internal::Factory>::NewFixedArrayWithFiller(v8::internal::Handle<v8::internal::Map>, int, v8::internal::Handle<v8::internal::Oddball>, v8::internal::AllocationType) [node]
13: 0x10db57d v8::internal::Handle<v8::internal::NumberDictionary> v8::internal::HashTable<v8::internal::NumberDictionary, v8::internal::NumberDictionaryShape>::New<v8::internal::Isolate>(v8::internal::Isolate*, int, v8::internal::AllocationType, v8::internal::MinimumCapacity) [node]
14: 0xfe3112  [node]
15: 0x108ed7d v8::internal::JSObject::NormalizeElements(v8::internal::Handle<v8::internal::JSObject>) [node]
16: 0x1003d1c  [node]
17: 0x108a015 v8::internal::JSObject::AddDataElement(v8::internal::Handle<v8::internal::JSObject>, unsigned int, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes) [node]
18: 0x10cdd6e v8::internal::Object::AddDataProperty(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::Maybe<v8::internal::ShouldThrow>, v8::internal::StoreOrigin) [node]
19: 0x10d2303 v8::internal::Object::SetProperty(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::StoreOrigin, v8::Maybe<v8::internal::ShouldThrow>) [node]
20: 0x1205826 v8::internal::Runtime::SetObjectProperty(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::StoreOrigin, v8::Maybe<v8::internal::ShouldThrow>) [node]
21: 0x1206da9 v8::internal::Runtime_SetKeyedProperty(int, unsigned long*, v8::internal::Isolate*) [node]
22: 0x15f0a99  [node]
Aborted
jeswr commented 2 years ago

Thanks for this @josd - the out of memory error is somewhat surprising, though if I had to hazard a guess at the cause it is from instantiating an new array entries for each premise const values = val2.value ? (val2.value in index2 ? [val2.value] : []) : Object.keys(index2);. This can easily be mititigated by iterating over the keys directly instead, at the cost of a slight performance degradation.

The slowdown is unsurprising given that I'm yet to implement anything like the semi-naive algorithm.