odpi / egeria

Egeria core
https://egeria-project.org
Apache License 2.0
809 stars 261 forks source link

Asset Management notebook - OMRS-GRAPH-REPOSITORY-400-002 exceptions #2489

Closed planetf1 closed 4 years ago

planetf1 commented 4 years ago

I am running the notebooks (locally) - configure, start, asset-mnagement/building a data catalog

All repositories, config, servers were clean/stopped prior to the test

In the first case I use the inmemory repository - the default, and I see no OMRS Exceptions

I then use the local graph repository, and see a series of OMRS-GRAPH-REPOSITORY-400-002 exceptions similar to:

Thu Jan 23 16:49:43 GMT 2020 cocoMDS6 Exception OMRS-AUDIT-9011 An incoming event of type processNewRelationshipEvent from cocoMDS1 (75c0bd0c-cb5f-4fc6-8505-53aad7c9cff2) generated an exception with message OMRS-GRAPH-REPOSITORY-400-002 There is an already an entity with GUID cdf80d0b-256b-4d3a-a479-0eeff9015fa4 so cannot honour request to create entity in saveRelationshipReferenceCopyToStore method of class org.odpi.openmetadata.adapters.repositoryservices.graphrepository.repositoryconnector.GraphOMRSMetadataStore to open metadata repository cocoMDS6

The full list (14) is at https://gist.github.com/b0d1cc57ad58f587bbf4b4f46dbbc006

this is how it plays out for that particular guid:

➜  ~ cat graph.log | grep cdf80d0b-256b-4d3a-a479-0eeff9015fa4
Thu Jan 23 16:49:37 GMT 2020 cocoMDS6 Event OMRS-AUDIT-8006 Processing incoming event of type NewEntityEvent for instance cdf80d0b-256b-4d3a-a479-0eeff9015fa4 from: OMRSEventOriginator{metadataCollectionId='91a53fb4-8c2d-4689-8c46-298d9730bfb6', serverName='cocoMDS2', serverType='Open Metadata Server', organizationName='Coco Pharmaceuticals'}
Thu Jan 23 16:49:37 GMT 2020 cocoMDS3 Event OMRS-AUDIT-8006 Processing incoming event of type NewEntityEvent for instance cdf80d0b-256b-4d3a-a479-0eeff9015fa4 from: OMRSEventOriginator{metadataCollectionId='91a53fb4-8c2d-4689-8c46-298d9730bfb6', serverName='cocoMDS2', serverType='Open Metadata Server', organizationName='Coco Pharmaceuticals'}
Thu Jan 23 16:49:37 GMT 2020 cocoMDS1 Event OMRS-AUDIT-8006 Processing incoming event of type NewEntityEvent for instance cdf80d0b-256b-4d3a-a479-0eeff9015fa4 from: OMRSEventOriginator{metadataCollectionId='91a53fb4-8c2d-4689-8c46-298d9730bfb6', serverName='cocoMDS2', serverType='Open Metadata Server', organizationName='Coco Pharmaceuticals'}
Thu Jan 23 16:49:37 GMT 2020 cocoMDS5 Event OMRS-AUDIT-8006 Processing incoming event of type NewEntityEvent for instance cdf80d0b-256b-4d3a-a479-0eeff9015fa4 from: OMRSEventOriginator{metadataCollectionId='91a53fb4-8c2d-4689-8c46-298d9730bfb6', serverName='cocoMDS2', serverType='Open Metadata Server', organizationName='Coco Pharmaceuticals'}
Thu Jan 23 16:49:37 GMT 2020 cocoMDS6 Event OMRS-AUDIT-8006 Processing incoming event of type NewEntityEvent for instance cdf80d0b-256b-4d3a-a479-0eeff9015fa4 from: OMRSEventOriginator{metadataCollectionId='91a53fb4-8c2d-4689-8c46-298d9730bfb6', serverName='cocoMDS2', serverType='Open Metadata Server', organizationName='Coco Pharmaceuticals'}
Thu Jan 23 16:49:43 GMT 2020 cocoMDS5 Exception OMRS-AUDIT-9011 An incoming event of type processNewRelationshipEvent from cocoMDS1 (75c0bd0c-cb5f-4fc6-8505-53aad7c9cff2) generated an exception with message OMRS-GRAPH-REPOSITORY-400-002 There is an already an entity with GUID cdf80d0b-256b-4d3a-a479-0eeff9015fa4 so cannot honour request to create entity in saveRelationshipReferenceCopyToStore method of class org.odpi.openmetadata.adapters.repositoryservices.graphrepository.repositoryconnector.GraphOMRSMetadataStore to open metadata repository cocoMDS5
Thu Jan 23 16:49:43 GMT 2020 cocoMDS2 Exception OMRS-AUDIT-9011 An incoming event of type processNewRelationshipEvent from cocoMDS1 (75c0bd0c-cb5f-4fc6-8505-53aad7c9cff2) generated an exception with message OMRS-GRAPH-REPOSITORY-400-002 There is an already an entity with GUID cdf80d0b-256b-4d3a-a479-0eeff9015fa4 so cannot honour request to create entity in saveRelationshipReferenceCopyToStore method of class org.odpi.openmetadata.adapters.repositoryservices.graphrepository.repositoryconnector.GraphOMRSMetadataStore to open metadata repository cocoMDS2
Thu Jan 23 16:49:43 GMT 2020 cocoMDS3 Exception OMRS-AUDIT-9011 An incoming event of type processNewRelationshipEvent from cocoMDS1 (75c0bd0c-cb5f-4fc6-8505-53aad7c9cff2) generated an exception with message OMRS-GRAPH-REPOSITORY-400-002 There is an already an entity with GUID cdf80d0b-256b-4d3a-a479-0eeff9015fa4 so cannot honour request to create entity in saveRelationshipReferenceCopyToStore method of class org.odpi.openmetadata.adapters.repositoryservices.graphrepository.repositoryconnector.GraphOMRSMetadataStore to open metadata repository cocoMDS3
Thu Jan 23 16:49:43 GMT 2020 cocoMDS6 Exception OMRS-AUDIT-9011 An incoming event of type processNewRelationshipEvent from cocoMDS1 (75c0bd0c-cb5f-4fc6-8505-53aad7c9cff2) generated an exception with message OMRS-GRAPH-REPOSITORY-400-002 There is an already an entity with GUID cdf80d0b-256b-4d3a-a479-0eeff9015fa4 so cannot honour request to create entity in saveRelationshipReferenceCopyToStore method of class org.odpi.openmetadata.adapters.repositoryservices.graphrepository.repositoryconnector.GraphOMRSMetadataStore to open metadata repository cocoMDS6

It's unclear why cocoMDS6 appears to process this incoming event from cocoMDS2 twice when the other servers don't. This does explain the exception -- since the entity already exists, so the graph repo is behaving correctly.

This is though happening for other entities ie:

➜  ~ cat graph.log | grep NewEntityEvent  | grep cocoMDS2 | grep cocoMDS6
Thu Jan 23 16:49:37 GMT 2020 cocoMDS6 Event OMRS-AUDIT-8006 Processing incoming event of type NewEntityEvent for instance cdf80d0b-256b-4d3a-a479-0eeff9015fa4 from: OMRSEventOriginator{metadataCollectionId='91a53fb4-8c2d-4689-8c46-298d9730bfb6', serverName='cocoMDS2', serverType='Open Metadata Server', organizationName='Coco Pharmaceuticals'}
Thu Jan 23 16:49:37 GMT 2020 cocoMDS6 Event OMRS-AUDIT-8006 Processing incoming event of type NewEntityEvent for instance cdf80d0b-256b-4d3a-a479-0eeff9015fa4 from: OMRSEventOriginator{metadataCollectionId='91a53fb4-8c2d-4689-8c46-298d9730bfb6', serverName='cocoMDS2', serverType='Open Metadata Server', organizationName='Coco Pharmaceuticals'}
Thu Jan 23 16:49:39 GMT 2020 cocoMDS6 Event OMRS-AUDIT-8006 Processing incoming event of type NewEntityEvent for instance 9de4bdc1-26ba-4437-8107-a09d3dbe2132 from: OMRSEventOriginator{metadataCollectionId='91a53fb4-8c2d-4689-8c46-298d9730bfb6', serverName='cocoMDS2', serverType='Open Metadata Server', organizationName='Coco Pharmaceuticals'}
Thu Jan 23 16:49:40 GMT 2020 cocoMDS6 Event OMRS-AUDIT-8006 Processing incoming event of type NewEntityEvent for instance 9de4bdc1-26ba-4437-8107-a09d3dbe2132 from: OMRSEventOriginator{metadataCollectionId='91a53fb4-8c2d-4689-8c46-298d9730bfb6', serverName='cocoMDS2', serverType='Open Metadata Server', organizationName='Coco Pharmaceuticals'}

Inmem shows the same - albeit with slightly different ordering due to timing, and it does not raise an exception

➜  ~ cat inmem.log | grep NewEntityEvent  | grep cocoMDS2 | grep cocoMDS6
Thu Jan 23 16:44:19 GMT 2020 cocoMDS6 Event OMRS-AUDIT-8006 Processing incoming event of type NewEntityEvent for instance 1b7dc720-d657-44f3-be34-2711a741ffda from: OMRSEventOriginator{metadataCollectionId='5e3c5242-4966-4dd0-aa42-41e7f5872216', serverName='cocoMDS2', serverType='Open Metadata Server', organizationName='Coco Pharmaceuticals'}
Thu Jan 23 16:44:19 GMT 2020 cocoMDS6 Event OMRS-AUDIT-8006 Processing incoming event of type NewEntityEvent for instance c3a864fd-e40c-4c94-b4cc-d0f33d5f6337 from: OMRSEventOriginator{metadataCollectionId='5e3c5242-4966-4dd0-aa42-41e7f5872216', serverName='cocoMDS2', serverType='Open Metadata Server', organizationName='Coco Pharmaceuticals'}
Thu Jan 23 16:44:19 GMT 2020 cocoMDS6 Event OMRS-AUDIT-8006 Processing incoming event of type NewEntityEvent for instance 1b7dc720-d657-44f3-be34-2711a741ffda from: OMRSEventOriginator{metadataCollectionId='5e3c5242-4966-4dd0-aa42-41e7f5872216', serverName='cocoMDS2', serverType='Open Metadata Server', organizationName='Coco Pharmaceuticals'}
Thu Jan 23 16:44:19 GMT 2020 cocoMDS6 Event OMRS-AUDIT-8006 Processing incoming event of type NewEntityEvent for instance c3a864fd-e40c-4c94-b4cc-d0f33d5f6337 from: OMRSEventOriginator{metadataCollectionId='5e3c5242-4966-4dd0-aa42-41e7f5872216', serverName='cocoMDS2', serverType='Open Metadata Server', organizationName='Coco Pharmaceuticals'}

So the questions are:

planetf1 commented 4 years ago

There's also a few other behaviour differences as can be seen from:

~ cat graph.log | grep OMRS-AUDIT | awk '{print $9}' | sort | uniq -c | sort -n
   5 OMRS-AUDIT-0002
   5 OMRS-AUDIT-0003
   5 OMRS-AUDIT-0032
   6 OMRS-AUDIT-0001
   6 OMRS-AUDIT-0004
   6 OMRS-AUDIT-0007
   6 OMRS-AUDIT-0050
   6 OMRS-AUDIT-0053
   8 OMRS-AUDIT-0060
   9 OMRS-AUDIT-0005
   9 OMRS-AUDIT-0026
   9 OMRS-AUDIT-0062
  10 OMRS-AUDIT-0303
  12 OMRS-AUDIT-0110
  14 OMRS-AUDIT-0019
  14 OMRS-AUDIT-0020
  14 OMRS-AUDIT-0029
  14 OMRS-AUDIT-0031
  14 OMRS-AUDIT-9011
  17 OMRS-AUDIT-0106
  29 OMRS-AUDIT-0015
  36 OMRS-AUDIT-0030
  57 OMRS-AUDIT-0112
  63 OMRS-AUDIT-0040
  63 OMRS-AUDIT-0041
 205 OMRS-AUDIT-8006
2395 OMRS-AUDIT-0301
4790 OMRS-AUDIT-0310
➜  ~ cat inmem.log | grep OMRS-AUDIT | awk '{print $9}' | sort | uniq -c | sort -n
   5 OMRS-AUDIT-0002
   5 OMRS-AUDIT-0003
   5 OMRS-AUDIT-0032
   6 OMRS-AUDIT-0001
   6 OMRS-AUDIT-0004
   6 OMRS-AUDIT-0007
   6 OMRS-AUDIT-0050
   6 OMRS-AUDIT-0053
   8 OMRS-AUDIT-0060
   9 OMRS-AUDIT-0005
   9 OMRS-AUDIT-0026
   9 OMRS-AUDIT-0062
  10 OMRS-AUDIT-0303
  12 OMRS-AUDIT-0110
  14 OMRS-AUDIT-0019
  14 OMRS-AUDIT-0020
  14 OMRS-AUDIT-0029
  14 OMRS-AUDIT-0031
  17 OMRS-AUDIT-0106
  29 OMRS-AUDIT-0015
  36 OMRS-AUDIT-0030
  63 OMRS-AUDIT-0040
  63 OMRS-AUDIT-0041
  64 OMRS-AUDIT-0112
 229 OMRS-AUDIT-8006
2395 OMRS-AUDIT-0301
4790 OMRS-AUDIT-0310

Specifically:

Processing events (like NewEntity) - but this is probably explained by the initial observation.

planetf1 commented 4 years ago

Another observation. very approximately the entity creations took 9s inmem and 60s using graph. That's probably expected (more complex, managing graph, persisting to disk, before tuning/scaling).It does feel significantly slower working through asset management using graph with most queries taking a few seconds (up to 5) rather than beeing < 1s.

mandy-chessell commented 4 years ago

The repository should not have a problem with seeing the same message twice. When the reposiotry is called, it is called with the saveEntityReferenceCopy or saveRelationshipReferenceCopy. It does not know if it is a create or update. If the event provides an instance that is either back level or equal to version in repository then should ignore.

planetf1 commented 4 years ago

Thanks for clarification - so it seems as if that message should not be an exception (assuming validation of version).

I'm intrigued as to why it is seen twice though - will need to look into the kafka queue and/or debug to see in case it's a consequence of another problem (not withstanding I completely agree it should be tolerated)

grahamwallis commented 4 years ago

The logic used in the graph repository saveEntityReferenceCopy() method is that, if it has an existing entity with the same GUID it checks if it is the master OR whether the reference copy it holds is from the same reopository as indicated in the event. i.e. it does the following:

            if (   metadataCollectionId.equals(entity.getMetadataCollectionId())
                || !vertexMetadataCollectionId.equals(entity.getMetadataCollectionId()) ) {

                /*
                 *  Error condition
                 *  Either the locsl repository is being asked to save a reference copy of something it already owns,
                 *  or it already has a proxy or reference copy of an entity from a repository other than the one that
                 *  submitted this reference copy.
                 */

                log.error("{} found an existing vertex from a different source, with metadataCollectionId {}", methodName, vertexMetadataCollectionId);
                g.tx().rollback();
                GraphOMRSErrorCode errorCode = GraphOMRSErrorCode.ENTITY_ALREADY_EXISTS;

It does not check for an older version - but it sounds like it should.

  1. Is it correct to check that the metadataCollectionId indicated in the event's instance is the same as the metadataCollectionId of an existing ref copy?

  2. Given the above (current logic) does this explain why you are seeing the result you get - i.e. is it true that the graph repo is the master of the entity in the event? Or is it true that the apparent master has changed - e.g. between this run of the environment and a previous one in which the ref copy was created based on an event from a different master? (I'd like at least one of these to be true so we can explain why you see what you see).

grahamwallis commented 4 years ago

Just receiving the same event twice would not cause the above logic to treat it as an error.

planetf1 commented 4 years ago

In this scenario all of the servers mentioned except cocoMDS4 have their own repository, and all are using the same repository type in the two example runs above.

In the first example I listed the originator was cocoMDS2 - but cocoMDS6 reported the exception.

In all cases the repository was completely cleaned & the full stack restarted before each run

On whether the event is exactly the same - I've not yet returned to capture the messages off kafka but can do this...

grahamwallis commented 4 years ago

I think I can see what the problem is here. Apologies - if I'm correct then it looks like a copy/paste problem deep in the guts of the metadata store that underpins the graph metadata collection. The reason we've not hit it before is I think that you are running a more interesting test scenario in which - I suspect - the relationship is created between entities that are mastered in different repositories. This should all work fine - and does in the in-memory repository which just saves the reference copy. The graph repository has more state validation - checking that if the proxies passed in the relationship match (by GUID) entities already known in the graph, then the home repository (metadataCollectionId) is the same in the graph as in the event. During this validation it trips over the copy/paste error - I will fix that now.

I will also check with Mandy whether we should relax the home metadataCollection checking being performed by the graph repository; or tighten up the in-memory repository; or neither.