shexjs / shex.js

shex.js javascript package
MIT License
60 stars 17 forks source link

Validation process appears to exhaust all available memory #31

Open lucaswerkmeister opened 6 years ago

lucaswerkmeister commented 6 years ago

I’ve reduced my shape expression to this fairly short snippet:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <http://schema.org/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX ex: <http://example.com/>

ex:A {
  schema:description rdf:langString+;
  wdt:P106 @ex:B
}

ex:B {
  schema:description rdf:langString+;
  schema:name rdf:langString+;
  rdfs:label rdf:langString+
}

Let’s say you’ve saved this as /tmp/ex.shex. If you then try to validate this against Q19296, a fairly small Wikidata example item, using the command

bin/validate \
  --data 'http://www.wikidata.org/entity/Q19296' \
  --shex '/tmp/ex.shex' \
  --node 'http://www.wikidata.org/entity/Q19296' \
  --shape 'http://example.com/A'

then Node.js will fairly quickly (within less than ten seconds on my system) crash with a fatal out-of-memory error. (You probably don’t want to try this in a browser – for me, that hung up the whole system for a while.)

<--- Last few GCs --->

[28338:0x5573831c1570]     5324 ms: Scavenge 1383.4 (1410.8) -> 1383.3 (1412.3) MB, 3.9 / 0.0 ms  (average mu = 0.201, current mu = 0.137) allocation failure                                                     
[28338:0x5573831c1570]     5329 ms: Scavenge 1384.6 (1412.3) -> 1384.5 (1413.3) MB, 4.1 / 0.0 ms  (average mu = 0.201, current mu = 0.137) allocation failure                                                     
[28338:0x5573831c1570]     5335 ms: Scavenge 1386.8 (1414.5) -> 1386.8 (1416.0) MB, 3.7 / 0.0 ms  (average mu = 0.201, current mu = 0.137) allocation failure                                                     

<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x2b39af19e589 <JSObject>
    0: builtin exit frame: concat(this=0x20b22af022d1 <JSArray[149730]>,0x20b22af02279 <JSArray[155]>,0x20b22af022d1 <JSArray[149730]>)                                                                           

    1: /* anonymous */ [0x1d8016002239] [/home/lucas/git/shex.js/lib/regex/threaded-val-nerr.js:~192] [pc=0x12bbaa51b858](this=0x78964086519 <JSGlobal Object>,nextThreads=0x20b22af022d1 <JSArray[149730]>,exprThread=0x19da8e58ced9 <Object map = 0x5c64157d91>)
  ...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: node::Abort() [node]
 2: 0x557381b65c1f [node]
 3: v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0x55738208f3b3 [node]
 6: 0x55738208f505 [node]
 7: v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]                                                                                                      
 8: v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]                                                                          
 9: v8::internal::Heap::AllocateRawWithRetry(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [node]                                                                                        
10: v8::internal::Factory::AllocateRawArray(int, v8::internal::PretenureFlag) [node]
11: v8::internal::Factory::NewFixedArrayWithFiller(v8::internal::Heap::RootListIndex, int, v8::internal::Object*, v8::internal::PretenureFlag) [node]                                                             
12: v8::internal::Factory::NewJSArrayStorage(v8::internal::Handle<v8::internal::JSArray>, int, int, v8::internal::ArrayStorageAllocationMode) [node]                                                              
13: v8::internal::Factory::NewJSArray(v8::internal::ElementsKind, int, int, v8::internal::ArrayStorageAllocationMode, v8::internal::PretenureFlag) [node]                                                         
14: v8::internal::ElementsAccessor::Concat(v8::internal::Isolate*, v8::internal::Arguments*, unsigned int, unsigned int) [node]                                                                                   
15: 0x557381d88649 [node]
16: 0x557381d8f78f [node]
17: 0x12bbaa109efd
Aborted (core dumped)

Commenting out any of the lines in the ShEx code makes the validation pass, so it feels like this is not a problem with any particular part of the shape, but rather like shex.js is just getting overwhelmed by the sheer amount of labels and descriptions?

Any ideas what could be done here? :/

lucaswerkmeister commented 6 years ago

So I’ve spent some more time trying to arrive at shape expressions that don’t crash shex.js, and I found something odd. Given the data and schema in this gist, validating http://www.wikidata.org/entity/Q42 against http://www.wikidata.org/entity/Q5 will succeed – but only because I’ve commented out two wdt:P40 (“child”) links: with those lines not commented out, shex.js crashes. Is shex.js perhaps trying to validate the same nodes against the same shapes again and again, because of the circular references between child items (P40) and parent items (P22, P25)?

ericprud commented 6 years ago

I tried reproducing this in shex-simple but I don't think I've got the right shapemap. I didn't see http://www.wikidata.org/entity/Q19296 in the gist data.

lucaswerkmeister commented 6 years ago

Sorry, the example in the gist is completely independent from the one in the original bug report. The minimal schema from the issue description (ex:A, ex:B) causes an error when you let shex.js download the data for Q19296, but not when you use the data file from the gist (probably because the gist doesn’t contain label or description triples). The data file from the gist, on the other hand, produces an error only with the schema from the gist, and only if the wdt:P40 lines are uncommented.

thadguidry commented 5 years ago

This is a lot easier to see if you use Chrome Latest and then profile the Performance while validating. Areas in the source code show highlighted function call timing in ms after profile.

shex-simple js

Each validated object uses a certain amount of memory for the object plus the graph it retains in bytes, and how much depends on the Object Properties. (your mileage may vary), but take a look using Chrome Memory Allocation timeline and the Containment / Retained Size.

Memory_Allocation_Timeline

ericprud commented 5 years ago

Tx, @thadguidry! I'd never ventured that far to the right in the Chrome debugger. I guess the issue is that I need t throw stuff away, in particular, error reports, once they're represented in the UI.