Closed planetf1 closed 4 years ago
There's also a few other behaviour differences as can be seen from:
~ cat graph.log | grep OMRS-AUDIT | awk '{print $9}' | sort | uniq -c | sort -n
5 OMRS-AUDIT-0002
5 OMRS-AUDIT-0003
5 OMRS-AUDIT-0032
6 OMRS-AUDIT-0001
6 OMRS-AUDIT-0004
6 OMRS-AUDIT-0007
6 OMRS-AUDIT-0050
6 OMRS-AUDIT-0053
8 OMRS-AUDIT-0060
9 OMRS-AUDIT-0005
9 OMRS-AUDIT-0026
9 OMRS-AUDIT-0062
10 OMRS-AUDIT-0303
12 OMRS-AUDIT-0110
14 OMRS-AUDIT-0019
14 OMRS-AUDIT-0020
14 OMRS-AUDIT-0029
14 OMRS-AUDIT-0031
14 OMRS-AUDIT-9011
17 OMRS-AUDIT-0106
29 OMRS-AUDIT-0015
36 OMRS-AUDIT-0030
57 OMRS-AUDIT-0112
63 OMRS-AUDIT-0040
63 OMRS-AUDIT-0041
205 OMRS-AUDIT-8006
2395 OMRS-AUDIT-0301
4790 OMRS-AUDIT-0310
➜ ~ cat inmem.log | grep OMRS-AUDIT | awk '{print $9}' | sort | uniq -c | sort -n
5 OMRS-AUDIT-0002
5 OMRS-AUDIT-0003
5 OMRS-AUDIT-0032
6 OMRS-AUDIT-0001
6 OMRS-AUDIT-0004
6 OMRS-AUDIT-0007
6 OMRS-AUDIT-0050
6 OMRS-AUDIT-0053
8 OMRS-AUDIT-0060
9 OMRS-AUDIT-0005
9 OMRS-AUDIT-0026
9 OMRS-AUDIT-0062
10 OMRS-AUDIT-0303
12 OMRS-AUDIT-0110
14 OMRS-AUDIT-0019
14 OMRS-AUDIT-0020
14 OMRS-AUDIT-0029
14 OMRS-AUDIT-0031
17 OMRS-AUDIT-0106
29 OMRS-AUDIT-0015
36 OMRS-AUDIT-0030
63 OMRS-AUDIT-0040
63 OMRS-AUDIT-0041
64 OMRS-AUDIT-0112
229 OMRS-AUDIT-8006
2395 OMRS-AUDIT-0301
4790 OMRS-AUDIT-0310
Specifically:
OMRS-AUDIT-0112 (63 inmem, 57 graph)
this is likely fine. it is the cohort reregistration. the exact number will depend on exact timing, So no concern
OMRS-AUDIT-8006 (228 inmem, 205 graph)
Processing events (like NewEntity) - but this is probably explained by the initial observation.
Another observation. very approximately the entity creations took 9s inmem and 60s using graph. That's probably expected (more complex, managing graph, persisting to disk, before tuning/scaling).It does feel significantly slower working through asset management using graph with most queries taking a few seconds (up to 5) rather than beeing < 1s.
The repository should not have a problem with seeing the same message twice. When the reposiotry is called, it is called with the saveEntityReferenceCopy or saveRelationshipReferenceCopy. It does not know if it is a create or update. If the event provides an instance that is either back level or equal to version in repository then should ignore.
Thanks for clarification - so it seems as if that message should not be an exception (assuming validation of version).
I'm intrigued as to why it is seen twice though - will need to look into the kafka queue and/or debug to see in case it's a consequence of another problem (not withstanding I completely agree it should be tolerated)
The logic used in the graph repository saveEntityReferenceCopy() method is that, if it has an existing entity with the same GUID it checks if it is the master OR whether the reference copy it holds is from the same reopository as indicated in the event. i.e. it does the following:
if ( metadataCollectionId.equals(entity.getMetadataCollectionId())
|| !vertexMetadataCollectionId.equals(entity.getMetadataCollectionId()) ) {
/*
* Error condition
* Either the locsl repository is being asked to save a reference copy of something it already owns,
* or it already has a proxy or reference copy of an entity from a repository other than the one that
* submitted this reference copy.
*/
log.error("{} found an existing vertex from a different source, with metadataCollectionId {}", methodName, vertexMetadataCollectionId);
g.tx().rollback();
GraphOMRSErrorCode errorCode = GraphOMRSErrorCode.ENTITY_ALREADY_EXISTS;
It does not check for an older version - but it sounds like it should.
Is it correct to check that the metadataCollectionId indicated in the event's instance is the same as the metadataCollectionId of an existing ref copy?
Given the above (current logic) does this explain why you are seeing the result you get - i.e. is it true that the graph repo is the master of the entity in the event? Or is it true that the apparent master has changed - e.g. between this run of the environment and a previous one in which the ref copy was created based on an event from a different master? (I'd like at least one of these to be true so we can explain why you see what you see).
Just receiving the same event twice would not cause the above logic to treat it as an error.
In this scenario all of the servers mentioned except cocoMDS4 have their own repository, and all are using the same repository type in the two example runs above.
In the first example I listed the originator was cocoMDS2 - but cocoMDS6 reported the exception.
In all cases the repository was completely cleaned & the full stack restarted before each run
On whether the event is exactly the same - I've not yet returned to capture the messages off kafka but can do this...
I think I can see what the problem is here. Apologies - if I'm correct then it looks like a copy/paste problem deep in the guts of the metadata store that underpins the graph metadata collection. The reason we've not hit it before is I think that you are running a more interesting test scenario in which - I suspect - the relationship is created between entities that are mastered in different repositories. This should all work fine - and does in the in-memory repository which just saves the reference copy. The graph repository has more state validation - checking that if the proxies passed in the relationship match (by GUID) entities already known in the graph, then the home repository (metadataCollectionId) is the same in the graph as in the event. During this validation it trips over the copy/paste error - I will fix that now.
I will also check with Mandy whether we should relax the home metadataCollection checking being performed by the graph repository; or tighten up the in-memory repository; or neither.
I am running the notebooks (locally) - configure, start, asset-mnagement/building a data catalog
All repositories, config, servers were clean/stopped prior to the test
In the first case I use the inmemory repository - the default, and I see no OMRS Exceptions
I then use the local graph repository, and see a series of OMRS-GRAPH-REPOSITORY-400-002 exceptions similar to:
The full list (14) is at https://gist.github.com/b0d1cc57ad58f587bbf4b4f46dbbc006
this is how it plays out for that particular guid:
It's unclear why cocoMDS6 appears to process this incoming event from cocoMDS2 twice when the other servers don't. This does explain the exception -- since the entity already exists, so the graph repo is behaving correctly.
This is though happening for other entities ie:
Inmem shows the same - albeit with slightly different ordering due to timing, and it does not raise an exception
So the questions are: