Open jeswr opened 2 years ago
Nice results @jeswr! Could you summarise what optimisations you conducted to increase performance here? Love to hear more!
Could you summarise what optimisations
Sure! The key point is to pre-compute as much information as possible from the rules before going to the data, in particular, determining which set of indexes to use for each premise based on which terms in rules are variables and which variables will be substituted in by the time we look at a particular term ins a premise
To improve over #294 - I now pre-compute the rule dependencies, so whenever a new implicit fact is found we can substitute the new fact into 'dependent rules' to create restricted rules that we then need to check.
I'll clean this up so this is easier to understand - but this may not be until later on in the month depending on how I track with other commitments.
This is really exciting @jeswr, awesome work!
So is this implementation N3-specific? Or could it also be generalized to run with the same level of performance over other RDF/JS Sources?
N3-specific?
This is N3 specific as it leverages facts about the internal indexes, and also benefits from not having to convert between the internal IDs, and RDFJS term representations.
I expect that the reasoning components I am implementing over in Comunica will be at least 2-3x slower (even after merging all of the optimizations that I have been working on here and over at AsyncIterator) as a result of not having access to these internals.
In terms of generalising to other sources @rubensworks, I have an unpublished forward chaining algorithm for Comuncia that uses similar ideas to what I have done here.
I expect that the reasoning components I am implementing over in Comunica will be at least 2-3x slower (even after merging all of the optimizations that I have been working on here and over at AsyncIterator) as a result of not having access to these internals.
Right, that makes sense.
At some point in time, I would love us to get Comunica to work directly on these internal indexes (https://github.com/comunica/comunica/issues/873), as this would make these kinds of use cases a lot faster.
Comunica to work directly on these internal indexes
Indeed - that was the motivation for https://github.com/rdfjs/N3.js/issues/289 as well
Indeed - that was the motivation for https://github.com/rdfjs/N3.js/issues/289 as well
Oh, interesting, I missed that issue.
Thanks @jeswr for keeping the deep taxonomy ball rolling 😉
For the moment however I can't reproduce your performance results
$ node Reasoning-perf.js
Reasoning over TimBL profile and FOAF
loading foaf ontology: 21.521ms
loading tim berners lee profile card: 18.742ms
apply reasoning: 33.002ms
Running Deep Taxonomy Benchmark
Load test-dl-10.n3: 2.849ms
Reasoning: test-dl-10.n3: 0.912ms
true
Load test-dl-100.n3: 10.211ms
Reasoning: test-dl-100.n3: 38.758ms
true
Load test-dl-1000.n3: 59.463ms
Reasoning: test-dl-1000.n3: 3.490s
true
Load test-dl-10000.n3: 219.026ms
Reasoning: test-dl-10000.n3: 3:29.937 (m:ss.mmm)
true
Load test-dl-100000.n3: 2.063s
^C
So I ^C
'd the depth of 100000 as I guess it would have taken 3.5 hours.
Just to make sure that we are on the same page, what I have is
$ git branch
feat/reasoning-fwd-chain
master
* semi-naive
$ git log
commit 9c686a3f5f0be052d84750aa932873245684c820 (HEAD -> semi-naive, origin/semi-naive)
Author: Jesse Wright <jesse.wright@anu.edu.au>
Date: Sat May 7 14:16:21 2022 +1000
chore: clean up output
commit ef6ba01cc3a6ed3b1508a4c471fdb153c8a7dfa7
Author: Jesse Wright <jesse.wright@anu.edu.au>
Date: Sat May 7 14:08:47 2022 +1000
chore: add deep taxonomy benchmark
and running on
======== OS ========
Linux 4.19.104-microsoft-standard x86_64 GNU/Linux
======== CPU ========
processor : 0
model name : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
processor : 1
model name : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
processor : 2
model name : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
processor : 3
model name : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
processor : 4
model name : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
processor : 5
model name : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
processor : 6
model name : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
processor : 7
model name : Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz
======== RAM ========
MemTotal: 12911996 kB
I can't reproduce your performance results
Weird @josd ... I have cloned this to a new location and managed to reproduce the results I initially posted. You seem to be working from the right commit.
A couple of potential things causing what you are seeing
a) If you are checking out the existing clone, then try running npm install && npm run build
again as it may be running off an old build of the file
b) Check your version of node (I'm using v18.1)
c) You have changed/added more rules to the tests (perhaps I am misunderstanding which rules are expected to be used for this benchmark)
For completeness, my OS specs are
jeswr@debian:~/testing-perf/N3.js$ uname -a
Linux debian 5.10.0-13-amd64 #1 SMP Debian 5.10.106-1 (2022-03-17) x86_64 GNU/Linux
Also if you could link to the 1_000_000 deep taxonomy once you have this working that would be great @josd .
I started fresh and made a pull request https://github.com/jeswr/N3.js/pull/10 It now works at lightspeed but the depth of 1000000 is not yet running completely
$ node Reasoning-perf.js
Reasoning over TimBL profile and FOAF
loading foaf ontology: 16.441ms
loading tim berners lee profile card: 16.041ms
apply reasoning: 43.532ms
Running Deep Taxonomy Benchmark
Load test-dl-10.n3: 4.164ms
Reasoning: test-dl-10.n3: 1.595ms
true
Load test-dl-100.n3: 12.537ms
Reasoning: test-dl-100.n3: 1.268ms
true
Load test-dl-1000.n3: 64.785ms
Reasoning: test-dl-1000.n3: 37.498ms
true
Load test-dl-10000.n3: 234.705ms
Reasoning: test-dl-10000.n3: 89.63ms
true
Load test-dl-100000.n3: 2.106s
Reasoning: test-dl-100000.n3: 923.296ms
true
<--- Last few GCs --->
[5276:0x6555770] 28000 ms: Scavenge (reduce) 2046.0 (2079.0) -> 2045.5 (2079.2) MB, 7.7 / 0.0 ms (average mu = 0.315, current mu = 0.302) allocation failure
[5276:0x6555770] 28067 ms: Scavenge (reduce) 2046.3 (2079.2) -> 2045.8 (2079.7) MB, 11.9 / 0.0 ms (average mu = 0.315, current mu = 0.302) allocation failure
[5276:0x6555770] 28117 ms: Scavenge (reduce) 2046.5 (2079.7) -> 2046.1 (2079.7) MB, 6.0 / 0.0 ms (average mu = 0.315, current mu = 0.302) allocation failure
<--- JS stacktrace --->
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
1: 0xb09980 node::Abort() [node]
2: 0xa1c235 node::FatalError(char const*, char const*) [node]
3: 0xcf77be v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
4: 0xcf7b37 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
5: 0xeaf3d5 [node]
6: 0xeafeb6 [node]
7: 0xebe3de [node]
8: 0xebee20 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
9: 0xec1d15 v8::internal::Heap::HandleGCRequest() [node]
10: 0xe4f3b7 v8::internal::StackGuard::HandleInterrupts() [node]
11: 0x11fb775 v8::internal::Runtime_StackGuard(int, unsigned long*, v8::internal::Isolate*) [node]
12: 0x15f0a99 [node]
Aborted
I expect that the reasoning components I am implementing over in Comunica will be at least 2-3x slower (even after merging all of the optimizations that I have been working on here and over at AsyncIterator) as a result of not having access to these internals.
Right, that makes sense.
At some point in time, I would love us to get Comunica to work directly on these internal indexes (https://github.com/comunica/comunica/issues/873), as this would make these kinds of use cases a lot faster.
I've just run some benchmarks (running https://github.com/comunica/comunica-feature-reasoning/tree/chore/benchmark-forward-chaining/packages/actor-rdf-reason-forward-chaining/perf) in Comunica with some of the Asynciterator and N3.js optimisations included and it is 2-3 orders of magnitude slower than the results we are seeing here.
I've just run some benchmarks (running https://github.com/comunica/comunica-feature-reasoning/tree/chore/benchmark-forward-chaining/packages/actor-rdf-reason-forward-chaining/perf) in Comunica with some of the Asynciterator and N3.js optimisations included and it is 2-3 orders of magnitude slower than the results we are seeing here.
Do you know what the cause is? Inefficient code? Or just a different algorithm?
I haven't managed to properly diagnose what the key cause is. I think it is mainly just that there are a lot of iterators being created/destroyed, having to convert between terms and internal representations, and not being able to pre-compute index lookups.
Also the tests were run with 2 different stores (one for explicit data, one for implicit data) which means that a lot of the ID <-> term conversions, and index lookups need to be done multiple times.
Furthermore, the 'match' method in N3 uses the iterator protocol with yield
which would also slow down performance compared to direct access to the index.
@jeswr -- Would you please change this PR's title to something more reflective & descriptive of its content? "Semi naive" doesn't even give much of a hint.
@josd Turns out it does work on the 10^6 dataset if we use a production environment and increase the memory allocation for node to 8GB
jeswr@debian:~/jst/N3.js$ NODE_ENV=production node --max-old-space-size=8192 perf/Reasoning-perf.js
Reasoning over TimBL profile and FOAF
loading foaf ontology: 24.45ms
loading tim berners lee profile card: 16.58ms
apply reasoning: 45.756ms
Running Deep Taxonomy Benchmark
Load test-dl-10.n3: 9.216ms
Reasoning: test-dl-10.n3: 0.65ms
true
Load test-dl-100.n3: 6.862ms
Reasoning: test-dl-100.n3: 1.867ms
true
Load test-dl-1000.n3: 61.933ms
Reasoning: test-dl-1000.n3: 29.815ms
true
Load test-dl-10000.n3: 233.515ms
Reasoning: test-dl-10000.n3: 102.49ms
true
Load test-dl-100000.n3: 1.776s
Reasoning: test-dl-100000.n3: 759.917ms
true
Load test-dl-1000000.n3: 22.141s
Reasoning: test-dl-1000000.n3: 8.898s
true
@rubensworks - In an attempt to work out why the Comunica implementation was so much slower I mocked up the resultant code if we removed all iterators, and asynchronicity, to end up with a reasoning algorithm that is RDFJS source agnostic. The results are about 2-3x slower than the internal implementation on N3.js.
I suspect the primary bottleneck in Comunica reasoning components is the sheer number of iterators that I was creating in the original implementation. I'll work to see if I can resolve this.
NODE_ENV=production node --max-old-space-size=28672 packages/actor-rdf-reason-forward-chaining/perf/ActorRdfReasonForwardChaining-perf.js
Load ./timbl.ttl: 21.178ms
Load ./foaf.ttl: 19.023ms
Reasoning TimBL + FOAF with full-rdfs: 155.707ms
1712
Load deep-taxonomy/test-dl-10.n3: 2.051ms
Reasoning Deep Taxonomy with type-inference: 0.752ms
true
Load deep-taxonomy/test-dl-100.n3: 5.042ms
Reasoning Deep Taxonomy with type-inference: 6.272ms
true
Load deep-taxonomy/test-dl-1000.n3: 54.89ms
Reasoning Deep Taxonomy with type-inference: 59.631ms
true
Load deep-taxonomy/test-dl-10000.n3: 207.026ms
Reasoning Deep Taxonomy with type-inference: 237.568ms
true
Load deep-taxonomy/test-dl-100000.n3: 1.877s
Reasoning Deep Taxonomy with type-inference: 2.114s
true
Load deep-taxonomy/test-dl-1000000.n3: 21.241s
Reasoning Deep Taxonomy with type-inference: 27.694s
true
The results are about 2-3x slower than the internal implementation on N3.js.
That gets us a lot closer already! (and may be in the acceptable range even)
@josd Turns out it does work on the 10^6 dataset if we use a production environment and increase the memory allocation for node to 8GB
🤯
I suspect the primary bottleneck in Comunica reasoning components is the sheer number of iterators that I was creating in the original implementation. I'll work to see if I can resolve this.
Does this already take into account the asynciterator changes that you and @jacoscaz have been working on?
That is absolutely brilliant. This has always been my dream for N3.js when I started it in 2011.
Does this already take into account the asynciterator changes
Yes - that said I was doing it in a very 'hacky' manner of copy/pasting that unreleased locally since I was in a bit of a rush, so I may have missed including a couple of optimizations.
@jeswr amazing work. I understand that the source-agnostic version would benefit from some form of shared dictionary encoding as per recent conversations?
Also, would you be able to profile one of those test runs, both for the Comunica version and the RDF/JS version? I think we might be able to learn a lot from that.
would you be able to profile one of those test runs
Is something like this what you were thinking @jacoscaz (also be aware that the time scale is logarithmic)
@jeswr I can't seem to zoom in on that image as it's quite small in resolution but that looks like a comparison of different reasoning solutions. What I'm thinking about is profiling CPU and memory usage across a run using either a sampling profiler (https://nodejs.org/en/docs/guides/simple-profiling/ ) or something more advanced, depending on how comfortable you are around profilers and whether you have used one before. Some IDEs (I tend to use WebStorm) can automatically hook up to V8's profiler if you configure them to do so in their launch configurations.
@RubenVerborgh - I think this is getting close to being ready for review - and whilst I think it can be further cleaned up in the future, I don't know if it is worth putting in the effort to do so until other future optimizations that I have planned are put in.
Before I rebase and mark it as ready for review, there is one remaining hurdle - getting full test coverage in this line. The reason I have included it is because I was following a similar pattern to here. Could you please explain why this condition is required in the latter case since it appears that graphs are removed if they are empty anyway.
I quickly tried running git blame but the most recent change to that line was a variable renaming.
Amazing stuff. That's what we need, and what I've wanted to have since the day I started this.
May I suggest to implement this in a separate class?
You could either extend Store
(ReasonedStore
or so), or use composition (Reasoner
with an embedded store).
I'm suggesting because I think this code will grow quite a bit over time, and it's good for it to have its own file.
Could you please explain why this condition is required
It was probably just a way to get around an overzealous linter not liking for … in
unguarded by if
😅
I'm suggesting because I think this code will grow quite a bit over time, and it's good for it to have its own file.
I would also very much think so, e.g. adding to the store (like now and is like forward chaining) or just goal oriented querying (backward chaining) or a "reasoned" combination of those.
May I suggest to implement this in a separate class?
Perhaps this is overcomplicating it - but if I go down that track is it worth trying it implement it as an RDFJSReasoner
class - which can reason over any RDF.Store, and perform the indexing optimization here if the input store is an instance of N3Store?
I've pushed the RDFJS implementation agnostic to https://github.com/jesse-wright-reasoning-honours/rdfjs-reasoner/blob/master/reasoner.ts. This is neither tied to n3 nor Comunica.
Happy to clean up and move this repo to the rdfjs/
organisation - in addition to publishing on NPM if you think it is appropriate.
It might be worth getting some input on the API of it first though. In particular, should it be a class with methods like .load
to load rules .set
to set the store to apply reasoning to etc, and should we be working towards an interface specification for @rdfjs/types
at the same time.
@rubensworks @RubenVerborgh
I've pushed the RDFJS implementation agnostic to https://github.com/jesse-wright-reasoning-honours/rdfjs-reasoner/blob/master/reasoner.ts. This is neither tied to n3 nor Comunica.
Sweet, how's perf? Should we still do #296? I'm happy to have it within N3.js—or not if you think it's best independent. But probably best to not have to maintain 2.
how's perf
@RubenVerborgh It ranges between 2-4x slower in the RDFJS version
Reasoning over TimBL profile and FOAF
loading foaf ontology: 13.229ms
loading tim berners lee profile card: 7.711ms
Reasoning: 161.886ms
Running Deep Taxonomy Benchmark
Reasoning: test-dl-10.n3: 0.404ms
Reasoning: test-dl-100.n3: 3.328ms
Reasoning: test-dl-1000.n3: 57.767ms
Reasoning: test-dl-10000.n3: 298.111ms
Reasoning: test-dl-100000.n3: 2.628s
Reasoning: test-dl-1000000.n3: Out of Memory
Unfortunately there isn't much logic that can be shared either due to the 'pre-compilation' of index lookups that I'm doing in #296
I think having a RDF/JS reasoner that doesn't depend on N3.js would be worth the 2x - 4x loss, given the absolute numbers I'm seeing. Don't get me wrong, N3.js is awesome, but enriching the RDF/JS ecosystem with such a wonderful component would be amazing.
Same—I'd strategically suggest to write it on N3.js now. The burden of proof is on us to show that it can all work fast in the browser. Once we have that, we can generalize. Will follow up on the reasoning PR.
I would also be in favor of focusing on the N3-specific reasoner for now.
I guess once we have an idea of how to tackle https://github.com/rdfjs/N3.js/issues/289 (and standardize this at RDF/JS-level), we can port this to an RDF/JS-level reasoner, without any loss in performance.
I would also be in favor of focusing on the N3-specific reasoner for now.
Agreed - in further support of this, I am now getting the logic for reasoning in Comunica closer to that of the generic RDFJS algorithm. The primary bottleneck came down to the unnecessary creation of many iterators in the process - which can be resolved by re-organising some code, and creating some custom iterators for reference these are the variants of iterator usage I have been testing - though I doubt they are of much use to anyone else at the present moment. So an RDFJS implementation separate from Comunica may become somewhat redundant.
@jeswr I really love https://github.com/jeswr/N3.js/blob/feat/N3Reasoner/src/N3Reasoner.js and maybe even more your N3Script
https://github.com/jeswr/N3.js/blob/feat/N3Reasoner/test/util.js
The performance is one of the best I have ever seen:
$ node --max-old-space-size=8192 N3Reasoner-perf.js
Reasoning over TimBL profile and FOAF
loading foaf ontology: 20.782ms
loading tim berners lee profile card: 16.937ms
Reasoning: 50.257ms
Running Deep Taxonomy Benchmark
Reasoning: test-dl-10.n3: 0.265ms
Reasoning: test-dl-100.n3: 1.089ms
Reasoning: test-dl-1000.n3: 17.434ms
Reasoning: test-dl-10000.n3: 109.719ms
Reasoning: test-dl-100000.n3: 1.040s
Reasoning: test-dl-1000000.n3: 12.971s
Thank you so much and Keep Up The Good Work!
Also a very nice performance on a smartphone https://twitter.com/josderoo/status/1525209755471462402 be it that there was not enough memory to run the deep taxonomy of depth 1000000 (subsumption with 3 million classes).
Also have a look at the extended deep taxonomy test which is taking the depth as the number of elements in the classes. It is closer to reality e.g. an ontology with 300000 classes (like SNOMED-CT) and 100000 individuals belonging to those classes (like patients having observations, diseases, treatments, ...). The modified code is like
$ git diff
diff --git a/test/util.js b/test/util.js
index f9759a9..7e29899 100644
--- a/test/util.js
+++ b/test/util.js
@@ -233,13 +233,15 @@ function load(filename, store) {
function generateDeepTaxonomy(size) {
const store = new N3.Store();
- store.addQuads([
- new Quad(
- new NamedNode('http://eulersharp.sourceforge.net/2009/12dtb/test#ind'),
- new NamedNode('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
- new NamedNode('http://eulersharp.sourceforge.net/2009/12dtb/test#N0'),
- ),
- ]);
+ for (let i = 0; i < size; i++) {
+ store.addQuads([
+ new Quad(
+ new NamedNode(`http://eulersharp.sourceforge.net/2009/12dtb/test#ind${i}`),
+ new NamedNode('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'),
+ new NamedNode('http://eulersharp.sourceforge.net/2009/12dtb/test#N0'),
+ ),
+ ]);
+ }
store.addQuads([
new Quad(
The current results are:
$ node --max-old-space-size=8192 perf/N3Reasoner-perf.js
Reasoning over TimBL profile and FOAF
loading foaf ontology: 18.728ms
loading tim berners lee profile card: 15.987ms
Reasoning: 57.744ms
Running Deep Taxonomy Benchmark
Reasoning: test-dl-10.n3: 1.16ms
Reasoning: test-dl-100.n3: 133.68ms
Reasoning: test-dl-1000.n3: 13.335s
<--- Last few GCs --->
[3711:0x696f850] 358474 ms: Mark-sweep 7625.1 (8239.9) -> 7614.2 (8230.2) MB, 6144.4 / 0.0 ms (average mu = 0.140, current mu = 0.048) allocation failure GC in old space requested
[3711:0x696f850] 364806 ms: Mark-sweep 7630.4 (8239.9) -> 7622.2 (8230.4) MB, 6018.9 / 0.0 ms (average mu = 0.096, current mu = 0.049) allocation failure GC in old space requested
<--- JS stacktrace --->
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
1: 0xb09980 node::Abort() [node]
2: 0xa1c235 node::FatalError(char const*, char const*) [node]
3: 0xcf77be v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
4: 0xcf7b37 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
5: 0xeaf3d5 [node]
6: 0xeafeb6 [node]
7: 0xebe3de [node]
8: 0xebee20 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
9: 0xec1d9e v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
10: 0xe83012 v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [node]
11: 0xe7d92c v8::internal::FactoryBase<v8::internal::Factory>::AllocateRawArray(int, v8::internal::AllocationType) [node]
12: 0xe7da05 v8::internal::FactoryBase<v8::internal::Factory>::NewFixedArrayWithFiller(v8::internal::Handle<v8::internal::Map>, int, v8::internal::Handle<v8::internal::Oddball>, v8::internal::AllocationType) [node]
13: 0x10db57d v8::internal::Handle<v8::internal::NumberDictionary> v8::internal::HashTable<v8::internal::NumberDictionary, v8::internal::NumberDictionaryShape>::New<v8::internal::Isolate>(v8::internal::Isolate*, int, v8::internal::AllocationType, v8::internal::MinimumCapacity) [node]
14: 0x10db7b3 v8::internal::Handle<v8::internal::NumberDictionary> v8::internal::HashTable<v8::internal::NumberDictionary, v8::internal::NumberDictionaryShape>::EnsureCapacity<v8::internal::Isolate>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::NumberDictionary>, int, v8::internal::AllocationType) [node]
15: 0x10dbdf4 v8::internal::Handle<v8::internal::NumberDictionary> v8::internal::Dictionary<v8::internal::NumberDictionary, v8::internal::NumberDictionaryShape>::Add<v8::internal::Isolate>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::NumberDictionary>, unsigned int, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyDetails, v8::internal::InternalIndex*) [node]
16: 0x1003d48 [node]
17: 0x108a015 v8::internal::JSObject::AddDataElement(v8::internal::Handle<v8::internal::JSObject>, unsigned int, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes) [node]
18: 0x10cdd6e v8::internal::Object::AddDataProperty(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::Maybe<v8::internal::ShouldThrow>, v8::internal::StoreOrigin) [node]
19: 0x10d2303 v8::internal::Object::SetProperty(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::StoreOrigin, v8::Maybe<v8::internal::ShouldThrow>) [node]
20: 0x1205826 v8::internal::Runtime::SetObjectProperty(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::StoreOrigin, v8::Maybe<v8::internal::ShouldThrow>) [node]
21: 0x1206da9 v8::internal::Runtime_SetKeyedProperty(int, unsigned long*, v8::internal::Isolate*) [node]
22: 0x15f0a99 [node]
Aborted
So from depth 10000 on it doesn't work anymore.
For EYE and depth 100000 we currently get:
$ swipl -x edtpe.pvm -- --nope http://josd.github.io/eye/reasoning/edt/test-facts.n3 --query http://josd.github.io/
eye/reasoning/edt/test-query.n3
eye --wcache http://josd.github.io/eye/reasoning .. --no-distinct-input http://josd.github.io/eye/reasoning/edt/test-edt.n3 --nope http://josd.github.io/eye/reasoning/edt/test-facts.n3 --query http://josd.github.io/eye/reasoning/edt/test-query.n3
EYE v22.0424.1245 josd
SWI-Prolog version 8.5.11-23-g5f6851705
starting 378 [msec cputime] 376 [msec walltime]
#Processed by EYE v22.0424.1245 josd
#eye --wcache http://josd.github.io/eye/reasoning .. --no-distinct-input http://josd.github.io/eye/reasoning/edt/test-edt.n3 --nope http://josd.github.io/eye/reasoning/edt/test-facts.n3 --query http://josd.github.io/eye/reasoning/edt/test-query.n3
GET http://josd.github.io/eye/reasoning/edt/test-facts.n3 FROM ../edt/test-facts.n3 SC=100000
GET http://josd.github.io/eye/reasoning/edt/test-query.n3 FROM ../edt/test-query.n3 SC=1
networking 2245 [msec cputime] 2840 [msec walltime]
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix : <http://eulersharp.sourceforge.net/2009/12dtb/test#>.
:i100000 a :N100000.
reasoning 482 [msec cputime] 481 [msec walltime]
#2022-05-14T21:57:24.111Z in=400001 out=1 ent=0 step=0 brake=1 inf=33861137 sec=3.105 inf/sec=10905358
#ENDS
2022-05-14T21:57:24.111Z in=400001 out=1 ent=0 step=0 brake=1 inf=33861137 sec=3.105 inf/sec=10905358
For EYE
Is it correct to assume that EYE doesn't suffer from the scaling problem that the N3Reasoner does because it is performing backwards chaining in this case? I definitely think backwards chaining should be added in here at some point meaning that backwards chaining reasoning can be done on calls like has
and match
So from depth 10000 on it doesn't work anymore.
This is likely because the array for buffering elements is getting too long by this point. One strategy I am thinking should be implemented to overcome this is to instead have a single s, p, o index in each .next
part of a rule containing all of the data that is yet to be put through that rule. I'd imagine we can also re-think the logic here to reduce/remove buffering.
For EYE
Is it correct to assume that EYE doesn't suffer from the scaling problem that the N3Reasoner does because it is performing backwards chaining in this case? I definitely think backwards chaining should be added in here at some point meaning that backwards chaining reasoning can be done on calls like
has
andmatch
Very much indeed! The broader picture is that we need metachaining like https://github.com/josd/eye/tree/master/reasoning/socrates-metachain-f2b or https://github.com/josd/eye/tree/master/reasoning/socrates-metachain-b2f as
There is no principle to tell whether to
use top-down or bottom-up reasoning.
(last quote from https://josd.github.io/quote.html and thanks to Werner Heisenberg)
Great work!
Note - the code is nasty at the moment and needs some love and refactoring, but it does work!
@josd @pbonte - alternative to #293 now implementing a better forward chaining algorithm.
Somewhat surprisingly it now takes longer to do RDFS rules over the union of timbl's profile card, and the foaf ontology (which is not deep at all, you only need one round of reasoning).
It is far more competitive on the Deep Taxolonomy Benchmark now @josd - this is the output of running the Reasoning-perf.js file in this branch. I have not benchmarked test-dl-1000000.n3 as I could not download this from sourceforge.