reTHINK-project / specs

You'll find here the full detailed specification of reTHINK Framework
Apache License 2.0
3 stars 3 forks source link

Catalogue CGUID #12

Closed pchainho closed 7 years ago

pchainho commented 7 years ago

As reported at https://github.com/reTHINK-project/dev-catalogue/issues/50, the catalogue cguid is causing problems in the runtime and, at this point should not be used.

Either we remove it from the Catalogue Model but then we don't have the possibility to identify the same object stored in different catalogue instances, or we have to figure out another solution.

We should take into account that we may have two situations:

I suspect this kind of solution would require a provisioning tool that imo fits into WP4 Governance functionality but I would like to hear other opinions.

sbecot commented 7 years ago

We may need both information: One is the code marked by the developper: eg VertxProtoStub 1.0 has a single code for any deployment. It is probably useless to load it again if we already have a VertxProtoStub 1.0 in the runtime. Second, we have to identify the deployement, because it has some specific configuration, and for tracing purpose: this VertxProtoStub 1.0 is deployed in pt service, with this descrition.json (ie parameters). Then I can download only the description.json of orange to reach a msgnode that needs VertxProtoStub 1.0 but with different configuration. To avoid malicious code, a checksum can be performed. The 2 CGUID would be derived from hysmart.rethink.ptinovacao.pt/VertxProtoStub/1.0 call.rethink.orange-labs.fr/VertxProtoStub/1.0 They are unique and compatible. For dataschema objects: using Connection dataschema, I must be able to use Orange Connection object and PT Connection object, both deployed in different catalogue if it is not already the case: hysmart.rethink.ptinovacao.pt/Connection/1.0 call.rethink.orange-labs.fr/Connection/1.0 but in the case of dataschema the compatibility can also be verified with introspection of the objects. This way you have uniqueness, compatibility, and a simple way for generation of ids.

Endebert commented 7 years ago

In my opinion, the cguid and how it is currently implemented fulfills its purpose and should stay in like it is.

However, similar to what @sbecot mentioned, I would suggest to add a new (mandatory) parameter for source code hash to the sourcePackage object:

  1. hashes can be generated by the Catalogue Database
  2. a hash can be used as the sourcePackage "name", making it globally comparable
  3. sourcePackageURLs would become dynamic, since they are based on the sourcePackage "name", which in turn would ensure that the cguid changes if the source code changes (which currently is not the case)
  4. version parameter in descriptor is not needed anymore, could be removed or moved into sourcePackage
  5. it allows people to double-check the validity of the source code before its execution
  6. a hash is basically a cguid for the sourcePackage, i.e. "GUID per object provided by the developer when there is nothing specific eg the configuration"
rjflp commented 7 years ago
  1. version parameter in descriptor is not needed anymore, could be removed or moved into sourcePackage

Isn't a version number usefull for making sure we have the latest version? If it is indeed a number, it can be ordered.

Endebert commented 7 years ago

@rjflp I agree, but for most use cases usually you want to know if the version that is being used is the one you already have. And for that purpose, the sourcePackage hash is better suited, since it is based on the source code. This is especially useful if you are testing code changes (and don't change the version number every time; otherwise you might keep using the locally stored version instead of the one you want to test).

Only if the hash doesn't match with the locally stored version, you can start checking if the one you have locally is newer than the remote one. Since it is the version of the source code and not the description, I think it's best to move it into the sourcePackage part.

However, another solution could be to have an ID for the sourcePackage that is a combination of version and hash, e.g. something like 1.3.2_d41d8cd98f00b204e9800998ecf8427e and put that into the descriptor.

rjflp commented 7 years ago

However, another solution could be to have an ID for the sourcePackage that is a combination of version and hash, e.g. something like 1.3.2_d41d8cd98f00b204e9800998ecf8427e and put that into the descriptor.

Wouldn't having separate fields be easier? Is it costly to have both?

Endebert commented 7 years ago

Wouldn't having separate fields be easier? Is it costly to have both?

No, cost is not an issue. It's more about intuitiveness. The version tag in the descriptor is misleading, as it seems to be the version for the descriptor(i.e. the version of the configuration), instead of the version of the source code.

I feel it makes more sense to have the version tag in the sourcePackage, as it contains the code and all other information about it. And since we are always linking to or providing the sourcePackage with the descriptor, I think its best to keep the information there.

For example: Let's say you create a descriptor with a configuration and a link to an external sourcePackage. Now, you might not have direct control about the sourcePackage, and the sourcePackage might get updated. So currently, you specify a version for code that might have been updated and is a different version.

emmelmann-fokus commented 7 years ago

closing since this issues is solved by implementation updates