Open chicco785 opened 3 years ago
I would DEFINITELY NOT store the context in the database - the context is not an attribute of the entity. It's more like a map that you must use to expand the aliases (value of entity type + attribute names). These expansions are the real names of the attributes (and real value of the entity type). And that (the longnames) is what you need to store in the database.
If this isn't 100% clear, let's have a chat and I'll explain this in detail.
Additionally, it seems you are using an outdated NGSI-LD spec. We changed the format for attributes with datasetId, quite some time ago. We now use an array, instead of various fields with '#x', e.g.:
"P1": [
{
"type": "Property",
"value": 1,
},
{
"type": "Property",
"value": 2,
"datasetId": "urn:x1"
},
{
"type": "Property",
"value": 3,
"datasetId": "urn:x2"
}
]
[ Don't want to see you wasting time on implementing obsolete stuff ... :) ]
@kzangeli thanks for your feedback, always appreciated :-)
So I've been trying to make friends with NGSI-LD but I think he doesn't like me. No honestly, I think it's going to be a while before I can call myself even a moderately knowledgeable LD chap. But what I gather from the spec is that the way you interpret a piece of JSON like the example above depends on the context it's tied to. So we could think of this "interpretation" process as a function
interpret : Context ⨉ Attribute ---> Meaning
For example, say I've got a context ctx
that defines a speed
attribute as an odometer reading in km/h
so
interpret ( ctx, { speed: 20 } ) = you're doing 20 km per hour
Later on, someone decides to change the units and publishes a new context ctx'
where speed is in mph
so now
interpret ( ctx', { speed: 20 } ) = you're doing 20 miles per hour
If we only stored the speed
attribute without a context, how would we interpret speed: 20
? Was the car doing 20 km/h
or rather 20 mph = 32.1869 km/h
? Also I think the two attributes should actually be considered different even if they sit in the same entity, i.e. we have ctx.speed
and ctx'.speed
at least if I understand RDF mechanics correctly---I'm no RDF expert either, so take my words with a pinch of salt.
So if I understand your suggestion, instead of storing an attribute named speed
we should rather store two attributes in this case: expansion_of(ctx.speed)
and expansion_of(ctx'.speed)
. That would solve the semantics problem I think, well as long as it isn't possible for two attributes defined in different contexts to resolve to the same long name. But I don't think that's the case? e.g. if ctx = http://foo/ctx/1.0
and ctx' = http://foo/ctx/2.0
then
expansion_of(ctx.speed) = http://foo/ctx/1.0/speed
expansion_of(ctx'.speed) = http://foo/ctx/2.0/speed
Is it?
Also, given an attribute "long" name, we should always be able to retrieve the context in which it was defined, is that so? e.g.
context_of(http://wada/wada/x) = http://wada/wada/
??
Additionally, it seems you are using an outdated NGSI-LD spec. We changed the format for attributes with datasetId, quite some time ago. We now use an array, instead of various fields with '#x', e.g.:
to be honest i am quite lost, finding right specs does not seem to be straight forward, this is the document google give me back when we search for the specs: ETSI GS CIM 009 V1.1.1 (dated Jan 2019)
is this backward compatible with NGSIv2? probably not.
Anyhow, for the way we store data today i.e. flat (and that will not change because otherwise query will require join, and performance will be shit) this may be irrelevant. each attribute at level 0 is translated into a db field of a given type based on the "value". basically, we will store all the attribute bloat in an array of objects.
he way we store data today i.e. flat
yep spot on. We need to think about this carefully, I don't think the way we store data at the moment is NGSI-LD friendly :-)
he way we store data today i.e. flat
yep spot on. We need to think about this carefully, I don't think the way we store data at the moment is NGSI-LD friendly :-)
it's timeseries friendly and backward compatible to support ngsi-v2, if that's not good enough, we don't care.
cool. also, if we stored "long" names, we'd be making alot of people unhappy I reckon since it would be a bit of a mission to e.g. write a query in grafana to pull data out of an entity table...
I would DEFINITELY NOT store the context in the database - the context is not an attribute of the entity. It's more like a map that you must use to expand the aliases (value of entity type + attribute names). These expansions are the real names of the attributes (and real value of the entity type). And that (the longnames) is what you need to store in the database.
If this isn't 100% clear, let's have a chat and I'll explain this in detail.
long names are going to be sql query unfriendly, i am not exactly sure to understand why this is actually needed... so we don't plan to do much with the context: nor expanding the name nor any other operation, the point is only to be able to return the context associated to an entity instance since this is needed. if not in the database, where will you store it?
to make the rational clear, while when not running aggregations exploding information complexity to represent data may not have an impact, on timeseries assuming you want to compute aggregate on temporal intervals, it does have quite an impact.
this is the rational for which already today we have some clear limitations: if attribute x is today of type number, tomorrow will it be number again, or managing aggregations and so on will be impossible. so far this proved to be accepted by our users, and I think is reasonable to not change this approach moving from ngsi-v2 to ngsi-ld, specially considering use cases we have been dealing so far with.
I think is reasonable to not change this approach moving from ngsi-v2 to ngsi-ld
Well, we might have to actually. Queries anyone? I think you mentioned this already earlier, but here's the nasty scenario. I'll build on the speed
attribute example from my earlier comment. To process speed
we need to understand what it is, well to some extent at least. Say we've got this series
(ctx, { speed: 20 }, t1), (ctx, { speed: 30 }, t2), (ctx, { speed: 20 }, t3), (ctx', { speed: 20 }, t4)
How would we compute the average speed?! It turns out that's the wrong question since we have two series actually:
(ctx, { speed: 20 }, t1), (ctx, { speed: 30 }, t2), (ctx, { speed: 20 }, t3)
(ctx', { speed: 20 }, t4)
The average speed for ctx.speed
is 20 + 30 + 20 = 23.33 km/h
whereas ctx'.speed
's average is 20 mph
. Notice the units! Adding up values from t1
through t4
would be like adding apples and oranges, nonsense. Oh dear. Lots to think about I guess...
So, let's set up an audio conference and straighten things out a little. Seems necessary :)
hahahaha, yea, good idea :-)
Before that, just some food for thought. If you could please forget about storing the context in the DB, and instead storing the attribute name (not the alias - the expanded name, which is the real name of the attribute), you will see how suddenly all your problems go away.
Except one:
In Orion-LD/mongo, I replace all dots (.) in an attribute name for a eq (=), as the dot is used as a separator in the query language.
E.g. GET /entites?q=A.b==12
Meaning: give me all entities that have an attribute named A (whatever that is expanded to using the current context), that have a sub-attribute called 'b' (expanded ...) with a value of 12.
So, the attribute names cannot contain any dots in the DB.
The fix is straightforward:
Creation/Update?
Query?
Voila. Problem solved!
Footnote: '=' is a forbidden character in an attribute name - I had to pick some forbidden char to use as a replacement for the dot.
Here you can find the specs: https://www.etsi.org/committee/cim The latest NGSi-LD API spec is v1.4.1
Just remembered, I once wrote a short markdown about the context: https://github.com/FIWARE/context.Orion-LD/blob/develop/doc/manuals-ld/the-context.md
If you could please forget about storing the context in the DB, and instead storing the attribute name (not the alias - the expanded name, which is the real name of the attribute),
yep, like I said earlier, all things being equal, this is an excellent suggestion, but...
you will see how suddenly all your problems go away.
I wish! Like @chicco785 pointed out, reconciling our internal storage model w/ the requirements of a full-blown NGSI-LD implementation isn't straightforward and we might have to make some compromises :-)
Here you can find the specs... ... I once wrote a short markdown about the context:
excellent, thanks for the pointers, much appreciated!
I think is reasonable to not change this approach moving from ngsi-v2 to ngsi-ld
Well, we might have to actually. Queries anyone? I think you mentioned this already earlier, but here's the nasty scenario. I'll build on the
speed
attribute example from my earlier comment. To processspeed
we need to understand what it is, well to some extent at least. Say we've got this series(ctx, { speed: 20 }, t1), (ctx, { speed: 30 }, t2), (ctx, { speed: 20 }, t3), (ctx', { speed: 20 }, t4)
How would we compute the average speed?! It turns out that's the wrong question since we have two series actually:
(ctx, { speed: 20 }, t1), (ctx, { speed: 30 }, t2), (ctx, { speed: 20 }, t3) (ctx', { speed: 20 }, t4)
The average speed for
ctx.speed
is20 + 30 + 20 = 23.33 km/h
whereasctx'.speed
's average is20 mph
. Notice the units! Adding up values fromt1
throught4
would be like adding apples and oranges, nonsense. Oh dear. Lots to think about I guess...
It stays reasonable not to change :)
This is already a limitation today, you can have metadata in ngsi v2 that specify the unitCode, for example. So it could be entry 1 is kmh and entry 2 is mph. Today we expect this to be uniformed before, if required, the injection in QL. Don't see why this should change, given the overhead either injection and/or querying. While we can go on for hours thinking about whatever complex corner case, pragmatically, we support what we need concretely. Multi unit? Not needed as off today. Easy backward compatibility with NGSIv2? needed.
ok, there's a lot I don't know about your implementation ... :) Might be an option to URL-encode attribute names inside the DB? Anyhoo, if you need my help, just call. I'll be happy to help out.
Anyhoo, if you need my help, just call.
awesome, thanks for offering!!
While we can go on for hours thinking about whatever complex corner case, pragmatically, we support what we need concretely
Oh dear, I've just realised I haven't explained properly what I have in mind, sorry I made a bit of a mess. My example wasn't so much about units (perhaps a corner case, but surely a welcome addition to the spec IMHO) but rather semantics. That is, the function
interpret : Context ⨉ Attribute ---> Meaning
I used earlier as a simple conceptual model to analyse the problem. If you agree the interpretation of an attribute depends on the context, it follows that to be able to interpret the attribute meaningfully in a time series, for each time point and attribute you also need to know the context that attribute came from. In other words a time series for an attribute x
of an entity e
becomes
(ctx, {x: a }, t1), (ctx, { x: b }, t2), (ctx, { x: c }, t3), (ctx', { x: d }, t4), ...
Notice how at time point t4
the context changed, so in actual fact (if I understand the way RDF works, not 100% sure!) e.x
in ctx
is not the same as e.x
in ctx'
. Now suppose we don't store the entirety of the context evolution over time---how we store stuff is irrelevant to my argument, we could take @kzangeli's suggestion and make it work for us or do something different. Without enough info about the context, even the most basic query of all would fail to return meaningful results. For example, if a client asks for e.x
between t1
and t4
, what values should we return? Surely it can't be the sequence (t1, a), (t2, b), (t3, c), (t4, d)
, can it be? If the attributes are different there are two value sequences: a, b, c
and d
but how could we even tell without knowing how the context changed over time? Also even if we know how the context changed over time, we'd still need the client to specify which x
is referring to, is it ctx.x
or ctx'.x
?
About speed in ctx
and speed in ctx'
- those are two different attributes - never mind the unitCode.
Two different attributes (as two different expanded names).
Stale issue message
@chicco785 @c0c0n3
- QL will persist only the last context (leveraging the metadata table) for an entityType for a given fiwareService, so this means that each time the context changes, the old one is overwritten.
- QL will persist only the last context for an entityId for a given fiwareService, so this means that each time the context changes, the old one is overwritten, but different entityId can have different context (this is not particularly brilliant performance wise, because it will increase the number of queries needed to retrieve information )
- QL will persist the context for each entry, so this means that you can track along time the evolution of the context, but of course you may end messing up if return context on aggregated queries.
I am not sure how you are planning to implement 1 and 2. But we can easily go with point 3 as we have stored instaceId in https://github.com/orchestracities/ngsi-timeseries-api/issues/533
I have gone through url: https://ngsi-ld-tutorials.readthedocs.io/en/latest/working-with-%40context.html for @context
. As per ,my understanding I would like to suggest some points, please correct me if I have wrongly interpreted anything:
I would like to contribute on this issue. Please suggest if I can go in this direction and raise PR for the same.
We can also use @context i.e., https://uri.etsi.org/ngsi-ld/v1/ngsi-ld-core-context-v1.3.jsonld
with the context provided which is implicitly included,
@kzangeli sorry for not coming back before, have bee geopardised by other priorities :/
i took today some time to think about how to handle the whole thing, and I will be happy to get your feedback, and also, if you have time, schedule a quick chat.
As previously mentioned by @c0c0n3:
You also mentioned to not store the context, but to an extend we need to "store" it.
Today when an entity is injected, we use the attribute name to generate the column name to store the attribute values in a flat relational db format.
This means that the following payload:
{
"id": "urn:ngsi-ld:OffStreetParking:Downtown1",
"type": "OffStreetParking",
"name": {
"type": "Property",
"value": "Downtown One"
},
"availableSpotNumber": {
"type": "Property",
"value": 121,
"observedAt": "2017-07-29T12:05:02Z",
"reliability": {
"type": "Property",
"value": 0.7
},
"providedBy": {
"type": "Relationship",
"object": "urn:ngsi-ld:Camera:C1"
}
},
"totalSpotNumber": {
"type": "Property",
"value": 200
},
"location": {
"type": "GeoProperty",
"value": {
"type": "Point",
"coordinates": [-8.5, 41.2]
}
},
"@context": [
"http://example.org/ngsi-ld/latest/parking.jsonld",
"https://uri.etsi.org/ngsi-ld/v1/ngsi-ld-core-context-v1.5.jsonld"
]
}
(today) is stored as: | entity_id | name | availableSpotNumber | totalSpotNumber | location | timeIndex |
---|---|---|---|---|---|---|
urn:ngsi-ld:OffStreetParking:Downtown1 | Downtown One | 121 | 200 | {"type": "GeoProperty","value": {"type": "Point","coordinates": [-8.5, 41.2] } | 2017-07-29T12:05:02Z |
Beside these, we store some metadata:
table_name | entity_attrs |
---|---|
"etoffstreetparking" | {"totalspotnumber":["totalSpotNumber","Integer"],"entity_type":["type","Text"],"time_index":["time_index","DateTime"],"name":["name","Text"],"location":["location","geo:json"],"entity_id":["id","Text"],"availablespotnumber":["availableSpotNumber","Integer"]} |
the metadata today are used to tell us for a given attribute in the table, what's the original name in NGSIv2 and the type, e.g.:
availablespotnumber
column maps to availableSpotNumber
NGSI-V2 attribute whose NGSI-V2 type is Integer
.
Now, building on this, for NGSI-LD and aiming at backward compatibility, we could have something like:
{
"totalspotnumber" : [
"totalSpotNumber",
"Integer",
"http://example.org/ngsi-ld/latest/parking/totalSpotNumber"
],
"name" : [
"name",
"Text",
"https://uri.etsi.org/ngsi-ld/name"
],
...,
"location" : [
"location",
"geo:json",
"https://uri.etsi.org/ngsi-ld/location"
],
...
}
this also means that if we have in somepoint:
{
"id": "urn:ngsi-ld:OffStreetParking:Downtown1",
"type": "OffStreetParking",
"name": {
"type": "Property",
"value": "Downtown One"
},
"http://example.org/ngsi-ld/latest/parking/name": {
"type": "Property",
"value": "Downtown One - Parking"
},
"@context": [
"http://example.org/ngsi-ld/latest/parking.jsonld",
"https://uri.etsi.org/ngsi-ld/v1/ngsi-ld-core-context-v1.5.jsonld"
]
}
we will add new metadata, with a new column name, e.g.:
{
"totalspotnumber" : [
"totalSpotNumber",
"Integer",
"http://example.org/ngsi-ld/latest/parking/totalSpotNumber"
],
"name" : [
"name",
"Text",
"https://uri.etsi.org/ngsi-ld/name"
],
"name-2" : [
"name",
"Text",
"http://example.org/ngsi-ld/latest/parking/name"
],
...,
"location" : [
"location",
"geo:json",
"https://uri.etsi.org/ngsi-ld/location"
],
...
}
Does this sound reasonable, and semantically correct? As a second step, we have to think to "metadata" handling, and we could either:
Let's meet and talk. My Skype handle: kzangeli
Guys, it seems to me the solution suggested by @chicco785 is the only sensible thing to do at this stage. It won't cater for several NGSI-LD features but in my opinion it'll work in the majority of cases in practice, plus it's backward compatible w/ NGSI v2 which is a boon to the majority of our users I reckon.
Here's some things we won't be able to do easily:
availableSpotNumber.providedBy
in the example entity above.There might be more things we won't be able to handle, but at the end of the day if these are just corner cases, do we really want to waste alot of dev cycles on them? To me we could just say we're almost NGSI-LD compliant and call it a day. Not sure how many NGSI-LD implementations out there can actually claim full compliance anyway. Is it?
Is your feature request related to a problem? Please describe.
Compared to NGSIv2, NGSI-LD introduces a special field
@context
, that provides linked-data inspired description of the attributes used in the payload. (cf #398)e.g.
This new attribute should be stored as well in the timeseries backend.
Describe the solution you'd like
Considering the current data model, there could be two options (and i here what's the best one , requires some expertise on NGSI-LD - advices from @jason-fox @kzangeli are welcome!):
Describe alternatives you've considered N/A
Additional context N/A