Model serialize/parse speed up (migrate to json-ld?)

juztas commented 1 year ago

Currently SiteRM<->Orchestrator depends on turtle format for model parse and serialize. Just noticed, that once model becomes big (e.g. enable all vlans on switch - and model becomes ~2MB Size) - load time is way too long:

Load Runtime: 2.5498993396759033
Serialize turtle: 68.14802598953247 Parse from turtle: 2.6531870365142822
turtle Encoding: 0.007528066635131836 Decoding: 0.01204061508178711
Serialize xml: 1.0920462608337402 Parse from xml: 4.955417156219482
xml Encoding: 0.011453866958618164 Decoding: 0.02042531967163086
Serialize json-ld: 1.842646598815918 Parse from json-ld: 3.2983763217926025
json-ld Encoding: 0.012243032455444336 Decoding: 0.021676063537597656
Serialize ntriples: 0.32485270500183105 Parse from ntriples: 2.2749462127685547
ntriples Encoding: 0.015148639678955078 Decoding: 0.028528928756713867
Serialize n3: 71.7072274684906 Parse from n3: 2.732438564300537
n3 Encoding: 0.0065119266510009766 Decoding: 0.012822151184082031
Serialize trig: 70.98702788352966 Parse from trig: 2.7944655418395996
trig Encoding: 0.006345033645629883 Decoding: 0.012744665145874023

Test Script:

import sys
import time
from DTNRMLibs.MainUtilities import encodebase64, decodebase64
from rdflib import Graph

ts1 = time.time()
modelIn = 'model-input.ttl'
currentGraph = Graph()
currentGraph.parse(modelIn, format='turtle')
ts2 = time.time()

print('Load Runtime: %s' % float(ts2 - ts1))

for key in ['turtle', 'xml', 'json-ld', 'ntriples', 'n3', 'trig']:
    ts1 = time.time()
    currentGraph.serialize(format=key)
    ts2 = time.time()
    with open('test-%s' % key, "w", encoding='utf-8') as fd:
        fd.write(currentGraph.serialize(format=key))
    ngraph = Graph()
    ts3 = time.time()
    ngraph.parse('test-%s' % key, format=key)
    ts4 = time.time()
    vals = currentGraph.serialize(format=key)
    ts5 = time.time()
    tmpEnc = encodebase64(vals)
    ts6 = time.time()
    ts7 = time.time()
    decodebase64(tmpEnc)
    ts8 = time.time()
    print('Serialize %s: %s Parse from %s: %s' % (key, float(ts2 - ts1), key, float(ts4 - ts3)))
    print('%s Encoding: %s Decoding: %s' % (key, float(ts6 - ts5), float(ts8 - ts7)))

Move all internals to json-ld (which is also easy to represent in GUI)? Also - API to get model allows to specify model type (currently only turtle). We should allow to specify 'turtle', 'xml', 'json-ld'.

juztas commented 1 year ago

Why we noticed that? Arista in LA had this parameter:

interface Port-Channel501
   description Port Channel to Caltech
   mtu 9214
   switchport trunk allowed vlan 1-3872,3874-4094
   switchport mode trunk
!

Which adds all VLANs and their labels from 1 to 3872 into the model, and the model gets big. So for this, there are two fixes:

[ ] For internal stuff - migrate to json-ld format. turtle is too slow to serialize. Can Orchestrator use json-ld? It will be optional via API - so Orchestrator can decide either turtle or json-ld @xi-yang
[x] https://github.com/sdn-sense/siterm/issues/188 - this will allow controlling which VLANs to add inside the model. Only the ones which SENSE controls - or ALL.
[x] https://github.com/sdn-sense/siterm/issues/221 - also log and represent timings inside the prometheus format.

juztas commented 8 months ago

This happens if we go with a very very big model. As it is in working version - I would leave this as a feature and in future to implement it via configuration parameter - if we see one or another site being slow in model generation.

Closing for now.

sdn-sense / siterm

Model serialize/parse speed up (migrate to json-ld?) #218