grafana / iot-sitewise-datasource

IoT Sitewise
Apache License 2.0
18 stars 9 forks source link

Disassociated streams: Hash entryId to fix bug with property aliases longer than 64 characters #239

Closed idastambuk closed 11 months ago

idastambuk commented 11 months ago

What this PR does / why we need it: When implementing disassociated data streams, I set the entryID to be equal to the propertyAlias. Normally for associated streams it is set to assetID. This isn't possible for disassociated streams since they don't have an asset connected to them. However, I didn't account for property aliases that are longer than 64 characters. One possible solution is to hash the property aliases. As far as I can see, they don't need to be in human readable format, since entryId is used on our BE to:

  1. send along with queries to distinguish them and
  2. access the results of those queries, be they success or errors, and append them to our frames.

The names of the frames/fields don't come from entryIDs but from the property aliases from the query, which are readable.

Another solution would be to just shorten the string, but it is possible to have 2 property aliases that are almost the same apart from the end. This would create two equal entryIDs, so I didn't go with this approach.

Which issue(s) this PR fixes:

Fixes https://github.com/grafana/iot-sitewise-datasource/issues/237

Special notes for your reviewer:

jhoulihan commented 11 months ago

Greetings, I firstly admit, I don't fully understand the depth of technical issues. That aside, I was triggered immediately by the assumption of xx not being human readable.

My use case is that our policy is to develop grafana pages in dev, migrate to test, migrate to prod.
The comment maybe 1000% unrelated, but from my context, using UUID's (eg sitewise models and assets) / hashes (as proposed ) being non-human readable completely breaks the workflow given that AWS/AMG dev account, AWS/AMG test account, AWS/AMG prod account produces different UUID's and I'm assuming hashes presenting a similar problem.

Datastreams are the facts, the measurements and the truth.

Models and assets that contain datastreams are just metadata , human constructs and context of the truth (datastreams)

Please consider to preserve the datastream legibility because they are the data asset and priority.

idastambuk commented 11 months ago

Greetings, I firstly admit, I don't fully understand the depth of technical issues. That aside, I was triggered immediately by the assumption of xx not being human readable.

My use case is that our policy is to develop grafana pages in dev, migrate to test, migrate to prod. The comment maybe 1000% unrelated, but from my context, using UUID's (eg sitewise models and assets) / hashes (as proposed ) being non-human readable completely breaks the workflow given that AWS/AMG dev account, AWS/AMG test account, AWS/AMG prod account produces different UUID's and I'm assuming hashes presenting a similar problem.

Datastreams are the facts, the measurements and the truth.

Models and assets that contain datastreams are just metadata , human constructs and context of the truth (datastreams)

Please consider to preserve the datastream legibility because they are the data asset and priority.

Hi @jhoulihan can you elaborate on "preserving the datastream legibility"? Can you explain in which cases it would be important to preserve the human readable format? Hashes and UUID's are not stored in Grafana in panels querying disassociated streams. The only thing that is needed is the propertyAlias which we use to query Sitewise. EntryId here, as explained in the description, is a temporary string that loses relevance for Grafana as soon as we process the data frame. I admit I don't know the context of how AMG migrates dashboards, but of course this temporary hash could be relevant for the Sitewise service and logs itself, or AMG's migrations themselves. I would like to understand the use case in order to evaluate a possible alternative solution. Thanks!

TheEvilDev commented 11 months ago

Greetings, I firstly admit, I don't fully understand the depth of technical issues. That aside, I was triggered immediately by the assumption of xx not being human readable. My use case is that our policy is to develop grafana pages in dev, migrate to test, migrate to prod. The comment maybe 1000% unrelated, but from my context, using UUID's (eg sitewise models and assets) / hashes (as proposed ) being non-human readable completely breaks the workflow given that AWS/AMG dev account, AWS/AMG test account, AWS/AMG prod account produces different UUID's and I'm assuming hashes presenting a similar problem. Datastreams are the facts, the measurements and the truth. Models and assets that contain datastreams are just metadata , human constructs and context of the truth (datastreams) Please consider to preserve the datastream legibility because they are the data asset and priority.

Hi @jhoulihan can you elaborate on "preserving the datastream legibility"? Can you explain in which cases it would be important to preserve the human readable format? Hashes and UUID's are not stored in Grafana in panels querying disassociated streams. The only thing that is needed is the propertyAlias which we use to query Sitewise. EntryId here, as explained in the description, is a temporary string that loses relevance for Grafana as soon as we process the data frame. I admit I don't know the context of how AMG migrates dashboards, but of course this temporary hash could be relevant for the Sitewise service and logs itself, or AMG's migrations themselves. I would like to understand the use case in order to evaluate a possible alternative solution. Thanks!

It's entirely possible for propertyAlias to exceed this 64 character limit, and whether it should be readable or not, the propertyAlias is entirely in the customer's control here, so it might not be the appropriate field to set this entryId to.

I admit, that I'm not really sure what the intended purpose of the entry id is, does it need to be able to be mapped back to the query somehow? Does it need to be the same given the same inputs (aka hash)? What did we use to do with this back in past versions where this wasn't an issue?

From my point of view, I don't see a need to keep this field human readable.

jhoulihan commented 11 months ago

Thanls Ida, For example.. I can simply paste in the property alias/sitewise datastream that is disassociated from any asset image

This is legible to promote from dev/test/prod

Using sitewise asset approach to create a query, UUID's are generated that break being able to migrate grafana dashboards from dev/test/prod as the environments are housed in different AWS accounts - therefore UUID's differ between accounts and unable to identify what datastream the UUID is associated with , without significant development effort. image

I'm treating the data ingestion, storage and presentation of datastreams as an enterprise solution. In that context, it means that I need to consider and support the sitewise/grafana stack with people who are not software engineers who can decipher UUIDs, hashes and similar constructs. They need a simple instruction to copy/paste the json from one account to another (ie dev/test/prod) and only have to find/replace the UUID of the datasource for each account (eventhough that can be improved on - but leave that to another day)

I hope this helps ? Thanks again Ida

TheEvilDev commented 11 months ago

@jhoulihan wouldn't hashing the alias solve the data migration problem for dev/test/prod type scenarios though? Since the same property alias/query will always generate the same hash? It's not the same issue as UUID's which have no semantic meaning other than it's position in the database.

jhoulihan commented 11 months ago

@jhoulihan wouldn't hashing the alias solve the data migration problem for dev/test/prod type scenarios though? Since the same property alias/query will always generate the same hash? It's not the same issue as UUID's which have no semantic meaning other than it's position in the database.

Thanks TED, you nailed my context and issue of the lack of semantics using UUID for Sitewise under the hood... which I am interpreting the hashing solution as being - same same. Like I mentioned firstly, I'm not 1000% completely aware of the technical aspects to the solution, if it works, awesome. I'm raising my flag as an end user that simply needs to display disassociated datastreams from sitewise, with a legible datastream in the json.

idastambuk commented 11 months ago

Hi @jhoulihan thanks for clarifying. In the context of unassociated streams that are queried only by propertyId, the hash not being legible will not be an issue, because these hashes are not stored in Grafana in panels querying disassociated streams. The only thing that is stored is the propertyAlias which we use to query Sitewise, which makes the json as readable as the propertyAlias (and which is the responsibility of the user). For such panels migration will be trivial, as you mention above.

As for propertyIDs being stored as UUIDs in Grafana panel JSON, this is the case with data streams that are queried with propertyId, and I'm not sure anything can be done about it, except for the user to add an alias. We do support querying streams with just an alias. This way the user's panel doesn't have to reference any UUIDs.

I hope that clears it up! In conclusion, as the hashes never reach panel JSON, I think we're good to proceed with merging this.