nickdrummond / star-wars-ontology

An OWL ontology describing events, characters and places in the Star Wars Universe
https://nickdrummond.github.io/star-wars-ontology/
3 stars 1 forks source link

SLow/failing queries on Railway #40

Open nickdrummond opened 1 year ago

nickdrummond commented 1 year ago

Performing this is slow/crashes

https://star-wars-ontology.up.railway.app/dlquery/?expression=Living_thing+and+%28memberOf+value+Jedi_Order%29&minus=hadRole+some+Jedi&syntax=man&query=instances

Same memory on instance as heroku = 500M Is this a VM memory size optimisation problem?

-Xmx500M is too much for the instance and crashes the app (which then restarts) -Xmx300M - prevents the heap exception, but slower than heroku

nickdrummond commented 1 year ago

Probably as good as we're going to get in the 500M total memory

memory

nickdrummond commented 1 year ago

Updated to the paid plan and its a lot faster - more memory and more cores

nickdrummond commented 1 year ago

Slow queries anyway, regardless of Railway:

instances of "Event and (locatedIn value Outer_Rim)": 309 results in 113953ms

And

instances of "Event and (locatedIn some (hasTerrain some Mountains))": 86 results in 133233ms 

And

instances of "locationOf some (included some (participant some {Ezra_Bridger, Spectres}))": 84 results in 61896ms 
instances of "included some (participant some {Ezra_Bridger, Spectres})": 131 results in 121621ms 
instances of "Planet and locationOf some (included some (participant value Ahsoka_Tano))": 21 results in 69080ms
nickdrummond commented 1 year ago
instances of "Event and (locatedIn value Outer_Rim)" module results time t/r
events 400 123805 309
imperial-era 214 42355 198
rebels 139 20745 149
republic-era 102 13777 135
new-republic-era 88 9805 111
clonewars 88 11064 126
book of boba 51 4422 87
kenobi 27 2404 89
trilogy 25 2211 88
mandalorian 19 1716 90
bad-batch 17 1778 104
prequels 15 1390 93
sequels 11 1066 97
resistance 10 897 90
rogue_one 6 441 74
solo 5 501 100
nickdrummond commented 1 year ago

No particular outlier on the modules - seems to be a non-linear slow down related to the number of results.

Just out of interest, is there a similar speed when run the "A-box transform"?

nickdrummond commented 1 year ago

The a-box version classifies faster, but although faster, still 75426 ms (423 results) Not bad, but still not great.

nickdrummond commented 1 year ago

Querying for mid-rim or core worlds events is a lot quicker (but far less results too). Just querying for the 810 events is instant. Event and (locatedIn some Planet) = 539 results / 274605 ms

Overall, we're using a LOT of memory. 250M when loaded 1.2G when events classified 1.8G when events query answered

nickdrummond commented 1 year ago

When did this start becoming a problem? Is it a gradual increase?

Event and (locatedIn value Outer_Rim)

date rev r t t/r
18/03/23 b788c3e4 400 119334 298
19/06/22 3d4c0cd4 293 60875 208
23/02/22 2ac29d2f 242 36377 150
04/01/22 3243b9c0 213 21414 101
26/11/21 0a36f372 172 10876 63
nickdrummond commented 1 year ago

Taking the transitivity off locatedIn gives instant results (but only 2 - obviously)

Taking the range off locatedIn - well, I gave up waiting.

Can we separate off the transitive part which is only useful for the location hierarchy?

Give the event a different property

Event and (in some (somewhereIn value Outer_Rim))

nickdrummond commented 1 year ago

locatedIn value Outer_Rim = 729 results in 129719 ms

So its the size of the locatedIn hierarchy?

nickdrummond commented 1 year ago

Separating event location off has a small effect: 102702 ms

nickdrummond commented 1 year ago

But only 53 results so obviously a mistake

nickdrummond commented 1 year ago

A rough search and replace of "locatedIn" with "in" for event locations.

Event and (in some (locatedIn value Outer_Rim))

379 results in 267887 ms - miles worse

nickdrummond commented 1 year ago

Removing range of locatedIn makes no difference. Adding a domain (Place or Event) - taking out livedIn subprop and removing all objects and roles locatedIn to get it to classify. No good

nickdrummond commented 1 year ago

Could it be an impact of the expressivity of the ontology - there are a few cardinality restrictions and nominals around we know pellet is not a fan of (although I don't think the number has significantly increased over time).

Tried extracting the EL++ profile of the "A-box transformed" ontologies. This actually takes longer to classify, and the same time to query.

nickdrummond commented 1 year ago

Checked HermiT as its been a while, but classification is extremely slow and got bored waiting for query results

nickdrummond commented 1 year ago

Sparql queries can be used, but incomplete:

357 results

nickdrummond commented 1 year ago

Moving transitivity to a parent property of locatedIn and querying against this instead makes no significant difference.

nickdrummond commented 1 year ago

Can we narrow down by Event type? Or is this also just related to the number of answers?

nickdrummond commented 1 year ago

on "A-box"

Fight is 44s for 89 results Mission is 32s for 48 results Release is 26s for 38 results Communication is 10s for 50 results Transfer is 3.4s for 21 results Death is 16s for 19 results

Hmm, spot check is interesting but maybe code this for all types

nickdrummond commented 1 year ago

So this does not reset the reasoner on each query so the timings may be worse as memory gets tight/other computations already made etc

Pellet has already precomputed the "standard" protege inferences.

Loaded in 2715ms Classified in 8719ms Briefing = 4 in 912ms War = 0 in 91991ms Distraction = 0 in 2ms Burial = 0 in 2ms Birth = 1 in 144ms Meeting = 28 in 3637ms Race = 1 in 101505ms Argument = 2 in 222ms Hiding = 1 in 94062ms TransferOfOwnership = 16 in 2413ms Healing = 3 in 93028ms Observation = 0 in 92803ms Conversation = 14 in 1818ms Transfer = 21 in 90665ms Ejection = 1 in 157ms Cubikahd = 0 in 104466ms Battle = 33 in 120660ms Sabacc = 2 in 289ms Heist = 9 in 1532ms Crash = 6 in 123432ms Saving = 0 in 94222ms Defence = 2 in 213ms Trial = 4 in 509ms Attack = 32 in 90195ms Wedding = 0 in 91953ms Arrival = 6 in 664ms Fight = 89 in 90701ms Evacuation = 3 in 90584ms Trap = 8 in 1084ms Interrogation = 1 in 90409ms Mission = 48 in 90222ms Nothing = 0 in 1ms Repair = 1 in 143ms Learning = 1 in 90793ms Execution = 5 in 555ms Escape = 19 in 2409ms Killing = 18 in 2378ms Deception = 0 in 7ms Training = 8 in 89850ms Rescue = 19 in 2415ms Speech = 0 in 97888ms Recruitment = 3 in 328ms Journey = 2 in 93144ms Capture = 13 in 1584ms Holo-darts = 0 in 9ms Sabotage = 0 in 90059ms Departure = 1 in 90933ms Duel = 16 in 2232ms Dejarik = 0 in 90262ms Stealing = 10 in 90133ms Death = 19 in 90164ms Trading = 3 in 293ms Chase = 6 in 91678ms Murder = 13 in 90168ms Confrontation = 21 in 90479ms Surrender = 0 in 98733ms Game = 2 in 106105ms Defection = 0 in 93909ms Removing = 0 in 95926ms Job = 7 in 896ms Torture = 3 in 97406ms Communication = 50 in 6035ms Release = 38 in 92081ms Doublecross = 2 in 90125ms Search = 14 in 105563ms

It's clear the times are not related to the number of results. In many cases, the opposite.

Next steps?

nickdrummond commented 1 year ago

Disjoints between events does not improve times

nickdrummond commented 1 year ago

Race is the worst performance. But deleting all Race instances does not make any difference to the query "Race and..."

nickdrummond commented 1 year ago

Tried "grounding" all vehicles so they are attached to the "location tree" by asserting they are all in the Galaxy. Doesn't improve anything. Actually slows down classification and query time.

nickdrummond commented 1 year ago

Even Event and (locatedIn value Tatooine) takes 15s

nickdrummond commented 1 year ago

Going back to transitive parent of locatedIn.

Removed livedIn -> locatedIn just as a precaution

If we make locatedIn functional too... Some inconsistencies (classifying star-wars.owl): Anaxes (Planet and not(Planet)) Tipoca_City Archeon Nebula others...

Why?

Removed the various disjoints on Places and we get some weird type inferences:

Level_1313 is a District and an Underground_Portal

Coruscant = Core_Worlds = Coruscant_Underworld = Galaxy = The_WOrks ( because Trace_Martez Workshop in LEvel_1313 and some UnderworldPortal in The_Works)

Coruscant_Underworld -> City AND District

Coruscant_Underworld locatedIn Self

Coruscant is a Planet and a City (inferred City because it has Districts which are defined as being locatedIn Cities)

Outer_Rim = Mid_Rim (because Hutt Space is in both)

Anaxes = PM-1203 (because Fort Anaxes is on both)

Endurance is unsatisfiable (Search_for_Kilian has location Endurance and Vanqor) Ghost (Training_Ezra_as_a_Jedi) Mustafar (Rescue_of_Kanan - locatedIn Sovereign) Razor_Crest (Transporting Eggs) Steadfast (Destroying_Steadfast_nav_tower) Galaxy ??Planet Anaxes (Attack_on_the_assembly_plant, Mission_to_sabotage_Trenchs_strategy)

Fixed all the other dual located events: Event and (locatedIn min 2 Place)

Maybe have property near for refactor

None of this has made any difference.

More ideas?

allDifferent for all locations

allDifferent for all events

nickdrummond commented 1 year ago

different individuals makes things worse. Removing the existing allDifferent for Living_Things actually speeds things up. On the "A-box transform", Events in teh Outer Rim take 55s.

nickdrummond commented 1 year ago

Taking out the disjoints makes it even faster (classification and query)

Pre-computing inferences:

------------------------------ Executing DL Query ------------------------------ Computed results for Instances in 36611 ms

Perhaps we need a "closure.owl" with disjoints and differents in? Good for building and some queries but not all.

nickdrummond commented 1 year ago

A-box transform with all disjoint/different removed:

instances of "Event and (locatedIn value Outer_Rim)": 426 results in 36611ms

And

instances of "Event and (locatedIn some (hasTerrain some Mountains))": 96 results in 43951ms

And

instances of "locationOf some (included some (participant some {Ezra_Bridger, Spectres}))": 84 results in 13971ms instances of "included some (participant some {Ezra_Bridger, Spectres})": 216 results in 53586ms instances of "Planet and locationOf some (included some (participant value Ahsoka_Tano))": 23 results in 10357ms

nickdrummond commented 1 year ago

Lets push a bit further. Lets take events and remove some popular properties With all the above = 6.4s classify Events in Outer_Rim = 40.1s (423 results)

Property usage classify query results
participant 5286 2.6s 28.6s 423
of 3296 3.7s 32.6s 422
hadRole 1644 3.3s 35.3s 423
after 1190 4.4s 39s 423
homeworldOf 136 4.8s 40.1s 423
s s 423

not homeworld, subs of locatedIn, locationOf, during,

But, if we get rid of a load of them we still get 421 results in 0.9s and 17s

Its just another overall scale thing - every set of axioms we take away by removing a property helps a bit - there's no magic axiom or construction causing us problems.

Do we need to rethink the modules?

nickdrummond commented 1 year ago

Taking out all property axioms (apart from transitive) including domain and range doesn't make any significant difference. Getting rid of all equiv classes - no difference. Classification has reduced to 4s though

Removing Event -> includes Self and Event -> locatedIn some Place

Reduces query time to about 30s

nickdrummond commented 1 year ago

Just looking for all unnamed locations (ie those specified as types) eg Abduction_of_Alora -> locatedIn some Shuttle

nickdrummond commented 1 year ago

Transformed almost all Event -> locatedIn some X into named individuals. With ABox and disjoints/different removed this classifies in 6s and queries in 30s.

nickdrummond commented 1 year ago

Get rid of all qualified cardinality restrictions in events as this halves the query time immediately.

nickdrummond commented 1 year ago

Also corrected all {x, y, z} and (disguisedAs...)

With no other messing Ontologies processed in 7020 ms by Pellet Computed results for Instances in 90093 ms

With Abox transform: Ontologies processed in 7787 ms by Pellet Computed results for Instances in 68542 ms

With disj+diff removed also: Ontologies processed in 2406 ms by Pellet Computed results for Instances in 30914 ms

nickdrummond commented 1 year ago

Reified several more properties into Events (creationOf, stunningOf etc)

Computed results for Instances in 79092 ms

nickdrummond commented 1 year ago

Removed all oneOf for disguisedAs as an experiment Computed results for Instances in 74561 ms

And removed all other nominals (roughly):

Ontologies processed in 8827 ms by Pellet Computed results for Instances in 70928 ms

Hmm