Open nickdrummond opened 1 year ago
Probably as good as we're going to get in the 500M total memory
Updated to the paid plan and its a lot faster - more memory and more cores
Slow queries anyway, regardless of Railway:
instances of "Event and (locatedIn value Outer_Rim)": 309 results in 113953ms
And
instances of "Event and (locatedIn some (hasTerrain some Mountains))": 86 results in 133233ms
And
instances of "locationOf some (included some (participant some {Ezra_Bridger, Spectres}))": 84 results in 61896ms
instances of "included some (participant some {Ezra_Bridger, Spectres})": 131 results in 121621ms
instances of "Planet and locationOf some (included some (participant value Ahsoka_Tano))": 21 results in 69080ms
instances of "Event and (locatedIn value Outer_Rim)" | module | results | time | t/r | ||
---|---|---|---|---|---|---|
events | 400 | 123805 | 309 | |||
imperial-era | 214 | 42355 | 198 | |||
rebels | 139 | 20745 | 149 | |||
republic-era | 102 | 13777 | 135 | |||
new-republic-era | 88 | 9805 | 111 | |||
clonewars | 88 | 11064 | 126 | |||
book of boba | 51 | 4422 | 87 | |||
kenobi | 27 | 2404 | 89 | |||
trilogy | 25 | 2211 | 88 | |||
mandalorian | 19 | 1716 | 90 | |||
bad-batch | 17 | 1778 | 104 | |||
prequels | 15 | 1390 | 93 | |||
sequels | 11 | 1066 | 97 | |||
resistance | 10 | 897 | 90 | |||
rogue_one | 6 | 441 | 74 | |||
solo | 5 | 501 | 100 |
No particular outlier on the modules - seems to be a non-linear slow down related to the number of results.
Just out of interest, is there a similar speed when run the "A-box transform"?
The a-box version classifies faster, but although faster, still 75426 ms (423 results) Not bad, but still not great.
Querying for mid-rim or core worlds events is a lot quicker (but far less results too). Just querying for the 810 events is instant. Event and (locatedIn some Planet) = 539 results / 274605 ms
Overall, we're using a LOT of memory. 250M when loaded 1.2G when events classified 1.8G when events query answered
When did this start becoming a problem? Is it a gradual increase?
Event and (locatedIn value Outer_Rim)
date | rev | r | t | t/r |
---|---|---|---|---|
18/03/23 | b788c3e4 | 400 | 119334 | 298 |
19/06/22 | 3d4c0cd4 | 293 | 60875 | 208 |
23/02/22 | 2ac29d2f | 242 | 36377 | 150 |
04/01/22 | 3243b9c0 | 213 | 21414 | 101 |
26/11/21 | 0a36f372 | 172 | 10876 | 63 |
Taking the transitivity off locatedIn gives instant results (but only 2 - obviously)
Taking the range off locatedIn - well, I gave up waiting.
Can we separate off the transitive part which is only useful for the location hierarchy?
Give the event a different property
Event and (in some (somewhereIn value Outer_Rim))
locatedIn value Outer_Rim
= 729 results in 129719 ms
So its the size of the locatedIn
hierarchy?
Separating event location off has a small effect: 102702 ms
But only 53 results so obviously a mistake
A rough search and replace of "locatedIn" with "in" for event locations.
Event and (in some (locatedIn value Outer_Rim))
379 results in 267887 ms - miles worse
Removing range of locatedIn makes no difference. Adding a domain (Place or Event) - taking out livedIn subprop and removing all objects and roles locatedIn to get it to classify. No good
Could it be an impact of the expressivity of the ontology - there are a few cardinality restrictions and nominals around we know pellet is not a fan of (although I don't think the number has significantly increased over time).
Tried extracting the EL++ profile of the "A-box transformed" ontologies. This actually takes longer to classify, and the same time to query.
Checked HermiT as its been a while, but classification is extremely slow and got bored waiting for query results
Sparql queries can be used, but incomplete:
357 results
Moving transitivity to a parent property of locatedIn
and querying against this instead makes no significant difference.
Can we narrow down by Event type? Or is this also just related to the number of answers?
on "A-box"
Fight is 44s for 89 results Mission is 32s for 48 results Release is 26s for 38 results Communication is 10s for 50 results Transfer is 3.4s for 21 results Death is 16s for 19 results
Hmm, spot check is interesting but maybe code this for all types
So this does not reset the reasoner on each query so the timings may be worse as memory gets tight/other computations already made etc
Pellet has already precomputed the "standard" protege inferences.
Loaded in 2715ms Classified in 8719ms Briefing = 4 in 912ms War = 0 in 91991ms Distraction = 0 in 2ms Burial = 0 in 2ms Birth = 1 in 144ms Meeting = 28 in 3637ms Race = 1 in 101505ms Argument = 2 in 222ms Hiding = 1 in 94062ms TransferOfOwnership = 16 in 2413ms Healing = 3 in 93028ms Observation = 0 in 92803ms Conversation = 14 in 1818ms Transfer = 21 in 90665ms Ejection = 1 in 157ms Cubikahd = 0 in 104466ms Battle = 33 in 120660ms Sabacc = 2 in 289ms Heist = 9 in 1532ms Crash = 6 in 123432ms Saving = 0 in 94222ms Defence = 2 in 213ms Trial = 4 in 509ms Attack = 32 in 90195ms Wedding = 0 in 91953ms Arrival = 6 in 664ms Fight = 89 in 90701ms Evacuation = 3 in 90584ms Trap = 8 in 1084ms Interrogation = 1 in 90409ms Mission = 48 in 90222ms Nothing = 0 in 1ms Repair = 1 in 143ms Learning = 1 in 90793ms Execution = 5 in 555ms Escape = 19 in 2409ms Killing = 18 in 2378ms Deception = 0 in 7ms Training = 8 in 89850ms Rescue = 19 in 2415ms Speech = 0 in 97888ms Recruitment = 3 in 328ms Journey = 2 in 93144ms Capture = 13 in 1584ms Holo-darts = 0 in 9ms Sabotage = 0 in 90059ms Departure = 1 in 90933ms Duel = 16 in 2232ms Dejarik = 0 in 90262ms Stealing = 10 in 90133ms Death = 19 in 90164ms Trading = 3 in 293ms Chase = 6 in 91678ms Murder = 13 in 90168ms Confrontation = 21 in 90479ms Surrender = 0 in 98733ms Game = 2 in 106105ms Defection = 0 in 93909ms Removing = 0 in 95926ms Job = 7 in 896ms Torture = 3 in 97406ms Communication = 50 in 6035ms Release = 38 in 92081ms Doublecross = 2 in 90125ms Search = 14 in 105563ms
It's clear the times are not related to the number of results. In many cases, the opposite.
Next steps?
Disjoints between events does not improve times
Race is the worst performance. But deleting all Race instances does not make any difference to the query "Race and..."
Tried "grounding" all vehicles so they are attached to the "location tree" by asserting they are all in the Galaxy. Doesn't improve anything. Actually slows down classification and query time.
Even Event and (locatedIn value Tatooine)
takes 15s
Going back to transitive parent of locatedIn.
Removed livedIn -> locatedIn just as a precaution
If we make locatedIn functional too... Some inconsistencies (classifying star-wars.owl): Anaxes (Planet and not(Planet)) Tipoca_City Archeon Nebula others...
Why?
Removed the various disjoints on Places and we get some weird type inferences:
Level_1313 is a District and an Underground_Portal
Coruscant = Core_Worlds = Coruscant_Underworld = Galaxy = The_WOrks ( because Trace_Martez Workshop in LEvel_1313 and some UnderworldPortal in The_Works)
Coruscant_Underworld -> City AND District
Coruscant_Underworld locatedIn Self
Coruscant is a Planet and a City (inferred City because it has Districts which are defined as being locatedIn Cities)
Outer_Rim = Mid_Rim (because Hutt Space is in both)
Anaxes = PM-1203 (because Fort Anaxes is on both)
Endurance is unsatisfiable (Search_for_Kilian has location Endurance and Vanqor) Ghost (Training_Ezra_as_a_Jedi) Mustafar (Rescue_of_Kanan - locatedIn Sovereign) Razor_Crest (Transporting Eggs) Steadfast (Destroying_Steadfast_nav_tower) Galaxy ??Planet Anaxes (Attack_on_the_assembly_plant, Mission_to_sabotage_Trenchs_strategy)
Fixed all the other dual located events: Event and (locatedIn min 2 Place)
Maybe have property near
for refactor
None of this has made any difference.
More ideas?
allDifferent for all locations
allDifferent for all events
different individuals makes things worse. Removing the existing allDifferent for Living_Things actually speeds things up. On the "A-box transform", Events in teh Outer Rim take 55s.
Taking out the disjoints makes it even faster (classification and query)
Pre-computing inferences:
------------------------------ Executing DL Query ------------------------------ Computed results for Instances in 36611 ms
Perhaps we need a "closure.owl" with disjoints and differents in? Good for building and some queries but not all.
A-box transform with all disjoint/different removed:
instances of "Event and (locatedIn value Outer_Rim)": 426 results in 36611ms
And
instances of "Event and (locatedIn some (hasTerrain some Mountains))": 96 results in 43951ms
And
instances of "locationOf some (included some (participant some {Ezra_Bridger, Spectres}))": 84 results in 13971ms instances of "included some (participant some {Ezra_Bridger, Spectres})": 216 results in 53586ms instances of "Planet and locationOf some (included some (participant value Ahsoka_Tano))": 23 results in 10357ms
Lets push a bit further. Lets take events and remove some popular properties With all the above = 6.4s classify Events in Outer_Rim = 40.1s (423 results)
Property | usage | classify | query | results |
---|---|---|---|---|
participant | 5286 | 2.6s | 28.6s | 423 |
of | 3296 | 3.7s | 32.6s | 422 |
hadRole | 1644 | 3.3s | 35.3s | 423 |
after | 1190 | 4.4s | 39s | 423 |
homeworldOf | 136 | 4.8s | 40.1s | 423 |
s | s | 423 |
not homeworld, subs of locatedIn, locationOf, during,
But, if we get rid of a load of them we still get 421 results in 0.9s and 17s
Its just another overall scale thing - every set of axioms we take away by removing a property helps a bit - there's no magic axiom or construction causing us problems.
Do we need to rethink the modules?
Taking out all property axioms (apart from transitive) including domain and range doesn't make any significant difference. Getting rid of all equiv classes - no difference. Classification has reduced to 4s though
Removing Event -> includes Self and Event -> locatedIn some Place
Reduces query time to about 30s
Just looking for all unnamed locations (ie those specified as types) eg Abduction_of_Alora -> locatedIn some Shuttle
Transformed almost all Event -> locatedIn some X
into named individuals.
With ABox and disjoints/different removed this classifies in 6s and queries in 30s.
Get rid of all qualified cardinality restrictions in events as this halves the query time immediately.
Also corrected all {x, y, z} and (disguisedAs...)
With no other messing Ontologies processed in 7020 ms by Pellet Computed results for Instances in 90093 ms
With Abox transform: Ontologies processed in 7787 ms by Pellet Computed results for Instances in 68542 ms
With disj+diff removed also: Ontologies processed in 2406 ms by Pellet Computed results for Instances in 30914 ms
Reified several more properties into Events (creationOf, stunningOf etc)
Computed results for Instances in 79092 ms
Removed all oneOf for disguisedAs as an experiment Computed results for Instances in 74561 ms
And removed all other nominals (roughly):
Ontologies processed in 8827 ms by Pellet Computed results for Instances in 70928 ms
Hmm
Performing this is slow/crashes
https://star-wars-ontology.up.railway.app/dlquery/?expression=Living_thing+and+%28memberOf+value+Jedi_Order%29&minus=hadRole+some+Jedi&syntax=man&query=instances
Same memory on instance as heroku = 500M Is this a VM memory size optimisation problem?
-Xmx500M is too much for the instance and crashes the app (which then restarts) -Xmx300M - prevents the heap exception, but slower than heroku