rdfjs / N3.js

Lightning fast, spec-compatible, streaming RDF for JavaScript
http://rdf.js.org/N3.js/
Other
676 stars 127 forks source link

Reasoning using the semi naive algorithm #295

Open jeswr opened 2 years ago

jeswr commented 2 years ago

Note - the code is nasty at the moment and needs some love and refactoring, but it does work!

@josd @pbonte - alternative to #293 now implementing a better forward chaining algorithm.

Somewhat surprisingly it now takes longer to do RDFS rules over the union of timbl's profile card, and the foaf ontology (which is not deep at all, you only need one round of reasoning).

It is far more competitive on the Deep Taxolonomy Benchmark now @josd - this is the output of running the Reasoning-perf.js file in this branch. I have not benchmarked test-dl-1000000.n3 as I could not download this from sourceforge.

Reasoning over TimBL profile and FOAF
loading foaf ontology: 28.07ms
loading tim berners lee profile card: 17.824ms
apply reasoning: 45.178ms

Running Deep Taxonomy Benchmark

Load test-dl-10.n3: 2.809ms
Reasoning: test-dl-10.n3: 0.778ms
true

Load test-dl-100.n3: 13.661ms
Reasoning: test-dl-100.n3: 2.647ms
true

Load test-dl-1000.n3: 68.42ms
Reasoning: test-dl-1000.n3: 25.906ms
true

Load test-dl-10000.n3: 289.421ms
Reasoning: test-dl-10000.n3: 113.692ms
true

Load test-dl-100000.n3: 1.799s
Reasoning: test-dl-100000.n3: 675.589ms
true
pbonte commented 2 years ago

Nice results @jeswr! Could you summarise what optimisations you conducted to increase performance here? Love to hear more!

jeswr commented 2 years ago

Could you summarise what optimisations

Sure! The key point is to pre-compute as much information as possible from the rules before going to the data, in particular, determining which set of indexes to use for each premise based on which terms in rules are variables and which variables will be substituted in by the time we look at a particular term ins a premise

To improve over #294 - I now pre-compute the rule dependencies, so whenever a new implicit fact is found we can substitute the new fact into 'dependent rules' to create restricted rules that we then need to check.

I'll clean this up so this is easier to understand - but this may not be until later on in the month depending on how I track with other commitments.

rubensworks commented 2 years ago

This is really exciting @jeswr, awesome work!

So is this implementation N3-specific? Or could it also be generalized to run with the same level of performance over other RDF/JS Sources?

jeswr commented 2 years ago

N3-specific?

This is N3 specific as it leverages facts about the internal indexes, and also benefits from not having to convert between the internal IDs, and RDFJS term representations.

I expect that the reasoning components I am implementing over in Comunica will be at least 2-3x slower (even after merging all of the optimizations that I have been working on here and over at AsyncIterator) as a result of not having access to these internals.

In terms of generalising to other sources @rubensworks, I have an unpublished forward chaining algorithm for Comuncia that uses similar ideas to what I have done here.

rubensworks commented 2 years ago

I expect that the reasoning components I am implementing over in Comunica will be at least 2-3x slower (even after merging all of the optimizations that I have been working on here and over at AsyncIterator) as a result of not having access to these internals.

Right, that makes sense.

At some point in time, I would love us to get Comunica to work directly on these internal indexes (https://github.com/comunica/comunica/issues/873), as this would make these kinds of use cases a lot faster.

jeswr commented 2 years ago

Comunica to work directly on these internal indexes

Indeed - that was the motivation for https://github.com/rdfjs/N3.js/issues/289 as well

rubensworks commented 2 years ago

Indeed - that was the motivation for https://github.com/rdfjs/N3.js/issues/289 as well

Oh, interesting, I missed that issue.

josd commented 2 years ago

Thanks @jeswr for keeping the deep taxonomy ball rolling 😉

For the moment however I can't reproduce your performance results

$ node Reasoning-perf.js
Reasoning over TimBL profile and FOAF
loading foaf ontology: 21.521ms
loading tim berners lee profile card: 18.742ms
apply reasoning: 33.002ms

Running Deep Taxonomy Benchmark

Load test-dl-10.n3: 2.849ms
Reasoning: test-dl-10.n3: 0.912ms
true

Load test-dl-100.n3: 10.211ms
Reasoning: test-dl-100.n3: 38.758ms
true

Load test-dl-1000.n3: 59.463ms
Reasoning: test-dl-1000.n3: 3.490s
true

Load test-dl-10000.n3: 219.026ms
Reasoning: test-dl-10000.n3: 3:29.937 (m:ss.mmm)
true

Load test-dl-100000.n3: 2.063s
^C

So I ^C'd the depth of 100000 as I guess it would have taken 3.5 hours.

Just to make sure that we are on the same page, what I have is

$ git branch
  feat/reasoning-fwd-chain
  master
* semi-naive

$ git log
commit 9c686a3f5f0be052d84750aa932873245684c820 (HEAD -> semi-naive, origin/semi-naive)
Author: Jesse Wright <jesse.wright@anu.edu.au>
Date:   Sat May 7 14:16:21 2022 +1000

    chore: clean up output

commit ef6ba01cc3a6ed3b1508a4c471fdb153c8a7dfa7
Author: Jesse Wright <jesse.wright@anu.edu.au>
Date:   Sat May 7 14:08:47 2022 +1000

    chore: add deep taxonomy benchmark

and running on

======== OS ========

Linux 4.19.104-microsoft-standard x86_64 GNU/Linux

======== CPU ========

processor       : 0
model name      : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
processor       : 1
model name      : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
processor       : 2
model name      : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
processor       : 3
model name      : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
processor       : 4
model name      : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
processor       : 5
model name      : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
processor       : 6
model name      : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
processor       : 7
model name      : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz

======== RAM ========

MemTotal:       12911996 kB
jeswr commented 2 years ago

I can't reproduce your performance results

Weird @josd ... I have cloned this to a new location and managed to reproduce the results I initially posted. You seem to be working from the right commit.

A couple of potential things causing what you are seeing a) If you are checking out the existing clone, then try running npm install && npm run build again as it may be running off an old build of the file b) Check your version of node (I'm using v18.1) c) You have changed/added more rules to the tests (perhaps I am misunderstanding which rules are expected to be used for this benchmark)

For completeness, my OS specs are

jeswr@debian:~/testing-perf/N3.js$ uname -a
Linux debian 5.10.0-13-amd64 #1 SMP Debian 5.10.106-1 (2022-03-17) x86_64 GNU/Linux
jeswr commented 2 years ago

Also if you could link to the 1_000_000 deep taxonomy once you have this working that would be great @josd .

josd commented 2 years ago

I started fresh and made a pull request https://github.com/jeswr/N3.js/pull/10 It now works at lightspeed but the depth of 1000000 is not yet running completely

$ node Reasoning-perf.js
Reasoning over TimBL profile and FOAF
loading foaf ontology: 16.441ms
loading tim berners lee profile card: 16.041ms
apply reasoning: 43.532ms

Running Deep Taxonomy Benchmark

Load test-dl-10.n3: 4.164ms
Reasoning: test-dl-10.n3: 1.595ms
true

Load test-dl-100.n3: 12.537ms
Reasoning: test-dl-100.n3: 1.268ms
true

Load test-dl-1000.n3: 64.785ms
Reasoning: test-dl-1000.n3: 37.498ms
true

Load test-dl-10000.n3: 234.705ms
Reasoning: test-dl-10000.n3: 89.63ms
true

Load test-dl-100000.n3: 2.106s
Reasoning: test-dl-100000.n3: 923.296ms
true

<--- Last few GCs --->

[5276:0x6555770]    28000 ms: Scavenge (reduce) 2046.0 (2079.0) -> 2045.5 (2079.2) MB, 7.7 / 0.0 ms  (average mu = 0.315, current mu = 0.302) allocation failure
[5276:0x6555770]    28067 ms: Scavenge (reduce) 2046.3 (2079.2) -> 2045.8 (2079.7) MB, 11.9 / 0.0 ms  (average mu = 0.315, current mu = 0.302) allocation failure
[5276:0x6555770]    28117 ms: Scavenge (reduce) 2046.5 (2079.7) -> 2046.1 (2079.7) MB, 6.0 / 0.0 ms  (average mu = 0.315, current mu = 0.302) allocation failure

<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb09980 node::Abort() [node]
 2: 0xa1c235 node::FatalError(char const*, char const*) [node]
 3: 0xcf77be v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xcf7b37 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xeaf3d5  [node]
 6: 0xeafeb6  [node]
 7: 0xebe3de  [node]
 8: 0xebee20 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 9: 0xec1d15 v8::internal::Heap::HandleGCRequest() [node]
10: 0xe4f3b7 v8::internal::StackGuard::HandleInterrupts() [node]
11: 0x11fb775 v8::internal::Runtime_StackGuard(int, unsigned long*, v8::internal::Isolate*) [node]
12: 0x15f0a99  [node]
Aborted
jeswr commented 2 years ago

I expect that the reasoning components I am implementing over in Comunica will be at least 2-3x slower (even after merging all of the optimizations that I have been working on here and over at AsyncIterator) as a result of not having access to these internals.

Right, that makes sense.

At some point in time, I would love us to get Comunica to work directly on these internal indexes (https://github.com/comunica/comunica/issues/873), as this would make these kinds of use cases a lot faster.

I've just run some benchmarks (running https://github.com/comunica/comunica-feature-reasoning/tree/chore/benchmark-forward-chaining/packages/actor-rdf-reason-forward-chaining/perf) in Comunica with some of the Asynciterator and N3.js optimisations included and it is 2-3 orders of magnitude slower than the results we are seeing here.

rubensworks commented 2 years ago

I've just run some benchmarks (running https://github.com/comunica/comunica-feature-reasoning/tree/chore/benchmark-forward-chaining/packages/actor-rdf-reason-forward-chaining/perf) in Comunica with some of the Asynciterator and N3.js optimisations included and it is 2-3 orders of magnitude slower than the results we are seeing here.

Do you know what the cause is? Inefficient code? Or just a different algorithm?

jeswr commented 2 years ago

I haven't managed to properly diagnose what the key cause is. I think it is mainly just that there are a lot of iterators being created/destroyed, having to convert between terms and internal representations, and not being able to pre-compute index lookups.

Also the tests were run with 2 different stores (one for explicit data, one for implicit data) which means that a lot of the ID <-> term conversions, and index lookups need to be done multiple times.

Furthermore, the 'match' method in N3 uses the iterator protocol with yield which would also slow down performance compared to direct access to the index.

TallTed commented 2 years ago

@jeswr -- Would you please change this PR's title to something more reflective & descriptive of its content? "Semi naive" doesn't even give much of a hint.

jeswr commented 2 years ago

@josd Turns out it does work on the 10^6 dataset if we use a production environment and increase the memory allocation for node to 8GB

jeswr@debian:~/jst/N3.js$ NODE_ENV=production node --max-old-space-size=8192 perf/Reasoning-perf.js 
Reasoning over TimBL profile and FOAF
loading foaf ontology: 24.45ms
loading tim berners lee profile card: 16.58ms
apply reasoning: 45.756ms

Running Deep Taxonomy Benchmark

Load test-dl-10.n3: 9.216ms
Reasoning: test-dl-10.n3: 0.65ms
true

Load test-dl-100.n3: 6.862ms
Reasoning: test-dl-100.n3: 1.867ms
true

Load test-dl-1000.n3: 61.933ms
Reasoning: test-dl-1000.n3: 29.815ms
true

Load test-dl-10000.n3: 233.515ms
Reasoning: test-dl-10000.n3: 102.49ms
true

Load test-dl-100000.n3: 1.776s
Reasoning: test-dl-100000.n3: 759.917ms
true

Load test-dl-1000000.n3: 22.141s
Reasoning: test-dl-1000000.n3: 8.898s
true

@rubensworks - In an attempt to work out why the Comunica implementation was so much slower I mocked up the resultant code if we removed all iterators, and asynchronicity, to end up with a reasoning algorithm that is RDFJS source agnostic. The results are about 2-3x slower than the internal implementation on N3.js.

I suspect the primary bottleneck in Comunica reasoning components is the sheer number of iterators that I was creating in the original implementation. I'll work to see if I can resolve this.

NODE_ENV=production node --max-old-space-size=28672 packages/actor-rdf-reason-forward-chaining/perf/ActorRdfReasonForwardChaining-perf.js 
Load ./timbl.ttl: 21.178ms
Load ./foaf.ttl: 19.023ms
Reasoning TimBL + FOAF with full-rdfs: 155.707ms
1712

Load deep-taxonomy/test-dl-10.n3: 2.051ms
Reasoning Deep Taxonomy with type-inference: 0.752ms
true

Load deep-taxonomy/test-dl-100.n3: 5.042ms
Reasoning Deep Taxonomy with type-inference: 6.272ms
true

Load deep-taxonomy/test-dl-1000.n3: 54.89ms
Reasoning Deep Taxonomy with type-inference: 59.631ms
true

Load deep-taxonomy/test-dl-10000.n3: 207.026ms
Reasoning Deep Taxonomy with type-inference: 237.568ms
true

Load deep-taxonomy/test-dl-100000.n3: 1.877s
Reasoning Deep Taxonomy with type-inference: 2.114s
true

Load deep-taxonomy/test-dl-1000000.n3: 21.241s
Reasoning Deep Taxonomy with type-inference: 27.694s
true
rubensworks commented 2 years ago

The results are about 2-3x slower than the internal implementation on N3.js.

That gets us a lot closer already! (and may be in the acceptable range even)

@josd Turns out it does work on the 10^6 dataset if we use a production environment and increase the memory allocation for node to 8GB

🤯

I suspect the primary bottleneck in Comunica reasoning components is the sheer number of iterators that I was creating in the original implementation. I'll work to see if I can resolve this.

Does this already take into account the asynciterator changes that you and @jacoscaz have been working on?

RubenVerborgh commented 2 years ago

That is absolutely brilliant. This has always been my dream for N3.js when I started it in 2011.

jeswr commented 2 years ago

Does this already take into account the asynciterator changes

Yes - that said I was doing it in a very 'hacky' manner of copy/pasting that unreleased locally since I was in a bit of a rush, so I may have missed including a couple of optimizations.

jacoscaz commented 2 years ago

@jeswr amazing work. I understand that the source-agnostic version would benefit from some form of shared dictionary encoding as per recent conversations?

Also, would you be able to profile one of those test runs, both for the Comunica version and the RDF/JS version? I think we might be able to learn a lot from that.

jeswr commented 2 years ago

would you be able to profile one of those test runs

Is something like this what you were thinking @jacoscaz (also be aware that the time scale is logarithmic) all_taxonomies

jacoscaz commented 2 years ago

@jeswr I can't seem to zoom in on that image as it's quite small in resolution but that looks like a comparison of different reasoning solutions. What I'm thinking about is profiling CPU and memory usage across a run using either a sampling profiler (https://nodejs.org/en/docs/guides/simple-profiling/ ) or something more advanced, depending on how comfortable you are around profilers and whether you have used one before. Some IDEs (I tend to use WebStorm) can automatically hook up to V8's profiler if you configure them to do so in their launch configurations.

jeswr commented 2 years ago

@RubenVerborgh - I think this is getting close to being ready for review - and whilst I think it can be further cleaned up in the future, I don't know if it is worth putting in the effort to do so until other future optimizations that I have planned are put in.

Before I rebase and mark it as ready for review, there is one remaining hurdle - getting full test coverage in this line. The reason I have included it is because I was following a similar pattern to here. Could you please explain why this condition is required in the latter case since it appears that graphs are removed if they are empty anyway.

I quickly tried running git blame but the most recent change to that line was a variable renaming.

RubenVerborgh commented 2 years ago

Amazing stuff. That's what we need, and what I've wanted to have since the day I started this.

May I suggest to implement this in a separate class? You could either extend Store (ReasonedStore or so), or use composition (Reasoner with an embedded store). I'm suggesting because I think this code will grow quite a bit over time, and it's good for it to have its own file.

Could you please explain why this condition is required

It was probably just a way to get around an overzealous linter not liking for … in unguarded by if 😅

josd commented 2 years ago

I'm suggesting because I think this code will grow quite a bit over time, and it's good for it to have its own file.

I would also very much think so, e.g. adding to the store (like now and is like forward chaining) or just goal oriented querying (backward chaining) or a "reasoned" combination of those.

jeswr commented 2 years ago

May I suggest to implement this in a separate class?

Perhaps this is overcomplicating it - but if I go down that track is it worth trying it implement it as an RDFJSReasoner class - which can reason over any RDF.Store, and perform the indexing optimization here if the input store is an instance of N3Store?

jeswr commented 2 years ago

I've pushed the RDFJS implementation agnostic to https://github.com/jesse-wright-reasoning-honours/rdfjs-reasoner/blob/master/reasoner.ts. This is neither tied to n3 nor Comunica.

Happy to clean up and move this repo to the rdfjs/ organisation - in addition to publishing on NPM if you think it is appropriate.

It might be worth getting some input on the API of it first though. In particular, should it be a class with methods like .load to load rules .set to set the store to apply reasoning to etc, and should we be working towards an interface specification for @rdfjs/types at the same time.

@rubensworks @RubenVerborgh

RubenVerborgh commented 2 years ago

I've pushed the RDFJS implementation agnostic to https://github.com/jesse-wright-reasoning-honours/rdfjs-reasoner/blob/master/reasoner.ts. This is neither tied to n3 nor Comunica.

Sweet, how's perf? Should we still do #296? I'm happy to have it within N3.js—or not if you think it's best independent. But probably best to not have to maintain 2.

jeswr commented 2 years ago

how's perf

@RubenVerborgh It ranges between 2-4x slower in the RDFJS version

Reasoning over TimBL profile and FOAF
loading foaf ontology: 13.229ms
loading tim berners lee profile card: 7.711ms
Reasoning: 161.886ms

Running Deep Taxonomy Benchmark
Reasoning: test-dl-10.n3: 0.404ms
Reasoning: test-dl-100.n3: 3.328ms
Reasoning: test-dl-1000.n3: 57.767ms
Reasoning: test-dl-10000.n3: 298.111ms
Reasoning: test-dl-100000.n3: 2.628s
Reasoning: test-dl-1000000.n3: Out of Memory

Unfortunately there isn't much logic that can be shared either due to the 'pre-compilation' of index lookups that I'm doing in #296

jacoscaz commented 2 years ago

I think having a RDF/JS reasoner that doesn't depend on N3.js would be worth the 2x - 4x loss, given the absolute numbers I'm seeing. Don't get me wrong, N3.js is awesome, but enriching the RDF/JS ecosystem with such a wonderful component would be amazing.

RubenVerborgh commented 2 years ago

Same—I'd strategically suggest to write it on N3.js now. The burden of proof is on us to show that it can all work fast in the browser. Once we have that, we can generalize. Will follow up on the reasoning PR.

rubensworks commented 2 years ago

I would also be in favor of focusing on the N3-specific reasoner for now.

I guess once we have an idea of how to tackle https://github.com/rdfjs/N3.js/issues/289 (and standardize this at RDF/JS-level), we can port this to an RDF/JS-level reasoner, without any loss in performance.

jeswr commented 2 years ago

I would also be in favor of focusing on the N3-specific reasoner for now.

Agreed - in further support of this, I am now getting the logic for reasoning in Comunica closer to that of the generic RDFJS algorithm. The primary bottleneck came down to the unnecessary creation of many iterators in the process - which can be resolved by re-organising some code, and creating some custom iterators for reference these are the variants of iterator usage I have been testing - though I doubt they are of much use to anyone else at the present moment. So an RDFJS implementation separate from Comunica may become somewhat redundant.

josd commented 2 years ago

@jeswr I really love https://github.com/jeswr/N3.js/blob/feat/N3Reasoner/src/N3Reasoner.js and maybe even more your N3Script https://github.com/jeswr/N3.js/blob/feat/N3Reasoner/test/util.js

The performance is one of the best I have ever seen:

$ node --max-old-space-size=8192 N3Reasoner-perf.js
Reasoning over TimBL profile and FOAF
loading foaf ontology: 20.782ms
loading tim berners lee profile card: 16.937ms
Reasoning: 50.257ms

Running Deep Taxonomy Benchmark
Reasoning: test-dl-10.n3: 0.265ms
Reasoning: test-dl-100.n3: 1.089ms
Reasoning: test-dl-1000.n3: 17.434ms
Reasoning: test-dl-10000.n3: 109.719ms
Reasoning: test-dl-100000.n3: 1.040s
Reasoning: test-dl-1000000.n3: 12.971s

Thank you so much and Keep Up The Good Work!

josd commented 2 years ago

Also a very nice performance on a smartphone https://twitter.com/josderoo/status/1525209755471462402 be it that there was not enough memory to run the deep taxonomy of depth 1000000 (subsumption with 3 million classes).

josd commented 2 years ago

Also have a look at the extended deep taxonomy test which is taking the depth as the number of elements in the classes. It is closer to reality e.g. an ontology with 300000 classes (like SNOMED-CT) and 100000 individuals belonging to those classes (like patients having observations, diseases, treatments, ...). The modified code is like

$ git diff
diff --git a/test/util.js b/test/util.js
index f9759a9..7e29899 100644
--- a/test/util.js
+++ b/test/util.js
@@ -233,13 +233,15 @@ function load(filename, store) {
 function generateDeepTaxonomy(size) {
   const store = new N3.Store();

-  store.addQuads([
-    new Quad(
-      new NamedNode('http://eulersharp.sourceforge.net/2009/12dtb/test#ind'),
-      new NamedNode('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
-      new NamedNode('http://eulersharp.sourceforge.net/2009/12dtb/test#N0'),
-    ),
-  ]);
+  for (let i = 0; i < size; i++) {
+    store.addQuads([
+      new Quad(
+        new NamedNode(`http://eulersharp.sourceforge.net/2009/12dtb/test#ind${i}`),
+        new NamedNode('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
+        new NamedNode('http://eulersharp.sourceforge.net/2009/12dtb/test#N0'),
+      ),
+    ]);
+  }

   store.addQuads([
     new Quad(

The current results are:

$ node --max-old-space-size=8192 perf/N3Reasoner-perf.js
Reasoning over TimBL profile and FOAF
loading foaf ontology: 18.728ms
loading tim berners lee profile card: 15.987ms
Reasoning: 57.744ms

Running Deep Taxonomy Benchmark
Reasoning: test-dl-10.n3: 1.16ms
Reasoning: test-dl-100.n3: 133.68ms
Reasoning: test-dl-1000.n3: 13.335s

<--- Last few GCs --->

[3711:0x696f850]   358474 ms: Mark-sweep 7625.1 (8239.9) -> 7614.2 (8230.2) MB, 6144.4 / 0.0 ms  (average mu = 0.140, current mu = 0.048) allocation failure GC in old space requested
[3711:0x696f850]   364806 ms: Mark-sweep 7630.4 (8239.9) -> 7622.2 (8230.4) MB, 6018.9 / 0.0 ms  (average mu = 0.096, current mu = 0.049) allocation failure GC in old space requested

<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0xb09980 node::Abort() [node]
 2: 0xa1c235 node::FatalError(char const*, char const*) [node]
 3: 0xcf77be v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xcf7b37 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xeaf3d5  [node]
 6: 0xeafeb6  [node]
 7: 0xebe3de  [node]
 8: 0xebee20 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 9: 0xec1d9e v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
10: 0xe83012 v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [node]
11: 0xe7d92c v8::internal::FactoryBase<v8::internal::Factory>::AllocateRawArray(int, v8::internal::AllocationType) [node]
12: 0xe7da05 v8::internal::FactoryBase<v8::internal::Factory>::NewFixedArrayWithFiller(v8::internal::Handle<v8::internal::Map>, int, v8::internal::Handle<v8::internal::Oddball>, v8::internal::AllocationType) [node]
13: 0x10db57d v8::internal::Handle<v8::internal::NumberDictionary> v8::internal::HashTable<v8::internal::NumberDictionary, v8::internal::NumberDictionaryShape>::New<v8::internal::Isolate>(v8::internal::Isolate*, int, v8::internal::AllocationType, v8::internal::MinimumCapacity) [node]
14: 0x10db7b3 v8::internal::Handle<v8::internal::NumberDictionary> v8::internal::HashTable<v8::internal::NumberDictionary, v8::internal::NumberDictionaryShape>::EnsureCapacity<v8::internal::Isolate>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::NumberDictionary>, int, v8::internal::AllocationType) [node]
15: 0x10dbdf4 v8::internal::Handle<v8::internal::NumberDictionary> v8::internal::Dictionary<v8::internal::NumberDictionary, v8::internal::NumberDictionaryShape>::Add<v8::internal::Isolate>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::NumberDictionary>, unsigned int, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyDetails, v8::internal::InternalIndex*) [node]
16: 0x1003d48  [node]
17: 0x108a015 v8::internal::JSObject::AddDataElement(v8::internal::Handle<v8::internal::JSObject>, unsigned int, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes) [node]
18: 0x10cdd6e v8::internal::Object::AddDataProperty(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::Maybe<v8::internal::ShouldThrow>, v8::internal::StoreOrigin) [node]
19: 0x10d2303 v8::internal::Object::SetProperty(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::StoreOrigin, v8::Maybe<v8::internal::ShouldThrow>) [node]
20: 0x1205826 v8::internal::Runtime::SetObjectProperty(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::StoreOrigin, v8::Maybe<v8::internal::ShouldThrow>) [node]
21: 0x1206da9 v8::internal::Runtime_SetKeyedProperty(int, unsigned long*, v8::internal::Isolate*) [node]
22: 0x15f0a99  [node]
Aborted

So from depth 10000 on it doesn't work anymore.

For EYE and depth 100000 we currently get:

$ swipl -x edtpe.pvm -- --nope http://josd.github.io/eye/reasoning/edt/test-facts.n3 --query http://josd.github.io/
eye/reasoning/edt/test-query.n3
eye --wcache http://josd.github.io/eye/reasoning .. --no-distinct-input http://josd.github.io/eye/reasoning/edt/test-edt.n3 --nope http://josd.github.io/eye/reasoning/edt/test-facts.n3 --query http://josd.github.io/eye/reasoning/edt/test-query.n3
EYE v22.0424.1245 josd
SWI-Prolog version 8.5.11-23-g5f6851705
starting 378 [msec cputime] 376 [msec walltime]
#Processed by EYE v22.0424.1245 josd
#eye --wcache http://josd.github.io/eye/reasoning .. --no-distinct-input http://josd.github.io/eye/reasoning/edt/test-edt.n3 --nope http://josd.github.io/eye/reasoning/edt/test-facts.n3 --query http://josd.github.io/eye/reasoning/edt/test-query.n3

GET http://josd.github.io/eye/reasoning/edt/test-facts.n3 FROM ../edt/test-facts.n3 SC=100000
GET http://josd.github.io/eye/reasoning/edt/test-query.n3 FROM ../edt/test-query.n3 SC=1
networking 2245 [msec cputime] 2840 [msec walltime]
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix : <http://eulersharp.sourceforge.net/2009/12dtb/test#>.

:i100000 a :N100000.

reasoning 482 [msec cputime] 481 [msec walltime]
#2022-05-14T21:57:24.111Z in=400001 out=1 ent=0 step=0 brake=1 inf=33861137 sec=3.105 inf/sec=10905358
#ENDS

2022-05-14T21:57:24.111Z in=400001 out=1 ent=0 step=0 brake=1 inf=33861137 sec=3.105 inf/sec=10905358
jeswr commented 2 years ago

For EYE

Is it correct to assume that EYE doesn't suffer from the scaling problem that the N3Reasoner does because it is performing backwards chaining in this case? I definitely think backwards chaining should be added in here at some point meaning that backwards chaining reasoning can be done on calls like has and match

So from depth 10000 on it doesn't work anymore.

This is likely because the array for buffering elements is getting too long by this point. One strategy I am thinking should be implemented to overcome this is to instead have a single s, p, o index in each .next part of a rule containing all of the data that is yet to be put through that rule. I'd imagine we can also re-think the logic here to reduce/remove buffering.

josd commented 2 years ago

For EYE

Is it correct to assume that EYE doesn't suffer from the scaling problem that the N3Reasoner does because it is performing backwards chaining in this case? I definitely think backwards chaining should be added in here at some point meaning that backwards chaining reasoning can be done on calls like has and match

Very much indeed! The broader picture is that we need metachaining like https://github.com/josd/eye/tree/master/reasoning/socrates-metachain-f2b or https://github.com/josd/eye/tree/master/reasoning/socrates-metachain-b2f as

There is no principle to tell whether to
use top-down or bottom-up reasoning.

(last quote from https://josd.github.io/quote.html and thanks to Werner Heisenberg)

jimsmart commented 2 years ago

Great work!