odpi / egeria

Egeria core
https://egeria-project.org
Apache License 2.0
802 stars 260 forks source link

Asset Lineage OMAS causing exceptions #4283

Closed planetf1 closed 3 years ago

planetf1 commented 3 years ago

When running the CTS (asset lineage is not configured) I notice constant exceptions being thrown relating to asset lineage OMAS. These are initiated by the Asset Lineage OMAS

Tue Dec 08 09:43:31 GMT 2020 cocoMDSx Event OMRS-AUDIT-8006 Processing incoming event of type RestoredRelationshipEvent for instance 354
b6b3b-af37-4e7e-8376-eb30967f5d1e from: OMRSEventOriginator{metadataCollectionId='ba959e4a-7380-4a30-8dd6-1a6811663af2', serverName='SUT
_Server', serverType='Metadata Repository Server', organizationName='null'}
Tue Dec 08 09:43:31 GMT 2020 cocoMDS2 Exception OMAG-REPOSITORY-HANDLER-0003 An unexpected error org.odpi.openmetadata.repositoryservice
s.ffdc.exception.EntityNotKnownException was returned to getRelationshipsByType by the metadata server during getRelationshipsByTypeGUID
 request for open metadata access service Asset Lineage OMAS on server cocoMDS2; message was OMRS-REPOSITORY-404-002 The entity identifi
ed with guid 1e64cec1-636b-4327-9fca-499b76bb5b71 passed on the getEntitySummary call is not known to the open metadata repository SUT_S
erver
Tue Dec 08 09:43:31 GMT 2020 cocoMDS2 Exception OMAG-REPOSITORY-HANDLER-0003 Supplementary information: log record id 951c4fae-def8-4a11
-921c-b9d71873f257 org.odpi.openmetadata.repositoryservices.ffdc.exception.EntityNotKnownException returned message of OMRS-REPOSITORY-4
04-002 The entity identified with guid 1e64cec1-636b-4327-9fca-499b76bb5b71 passed on the getEntitySummary call is not known to the open
 metadata repository SUT_Server and stacktrace of
OCFCheckedExceptionBase{reportedHTTPCode=404, reportingClassName='org.odpi.openmetadata.repositoryservices.clients.LocalRepositoryServic
esClient', reportingActionDescription='getRelationshipsForEntity', reportedErrorMessage='OMRS-REPOSITORY-404-002 The entity identified w
ith guid 1e64cec1-636b-4327-9fca-499b76bb5b71 passed on the getEntitySummary call is not known to the open metadata repository SUT_Serve
r', reportedErrorMessageId='OMRS-REPOSITORY-404-002', reportedErrorMessageParameters=[1e64cec1-636b-4327-9fca-499b76bb5b71, getEntitySum
mary, SUT_Server], reportedSystemAction='The system is unable to retrieve the properties for the requested entity because the supplied g
uid is not recognized.', reportedUserAction='The guid is supplied by the caller to the server.  It may have a logic problem that has cor
rupted the guid, or the entity has been deleted since the guid was retrieved.', reportedCaughtException=null, reportedCaughtExceptionCla
ssName='null', relatedProperties=null}
        at org.odpi.openmetadata.repositoryservices.clients.MetadataCollectionServicesClient.detectAndThrowEntityNotKnownException(Metad
ataCollectionServicesClient.java:5495)
        at org.odpi.openmetadata.repositoryservices.clients.MetadataCollectionServicesClient.getRelationshipsForEntity(MetadataCollectio
nServicesClient.java:1320)
        at org.odpi.openmetadata.adapters.repositoryservices.rest.repositoryconnector.OMRSRESTMetadataCollection.getRelationshipsForEnti
ty(OMRSRESTMetadataCollection.java:980)
        at org.odpi.openmetadata.repositoryservices.enterprise.repositoryconnector.executors.GetRelationshipsForEntityExecutor.issueRequ
estToRepository(GetRelationshipsForEntityExecutor.java:171)
        at org.odpi.openmetadata.repositoryservices.enterprise.repositoryconnector.control.ParallelFederationControl.executeCommand(Para
llelFederationControl.java:56)
        at org.odpi.openmetadata.repositoryservices.enterprise.repositoryconnector.EnterpriseOMRSMetadataCollection.getRelationshipsForE
ntity(EnterpriseOMRSMetadataCollection.java:1134)
        at org.odpi.openmetadata.commonservices.repositoryhandler.RepositoryHandler.getRelationshipsByType(RepositoryHandler.java:3815)
        at org.odpi.openmetadata.commonservices.repositoryhandler.RepositoryHandler.getRelationshipsByType(RepositoryHandler.java:3736)
        at org.odpi.openmetadata.accessservices.assetlineage.handlers.GlossaryContextHandler.getRelationshipsByTypeGUID(GlossaryContextH
andler.java:305)

Asset lineage itself records exceptions such as:

Tue Dec 08 09:47:08 GMT 2020 cocoMDS2 Exception OMAS-ASSET-LINEAGE-0005 An exception occurred while processing incoming event OMAG-REPOSITORY-HANDLER-500-001 An unexpected error org.odpi.openmetadata.repositoryservices.ffdc.exception.EntityNotKnownException was returned to getRelationshipsByType by the metadata server during getRelationshipsByTypeGUID request for open metadata access service Asset Lineage OMAS on server cocoMDS2; message was OMRS-REPOSITORY-404-002 The entity identified with guid remotef2-53ee-4861-be24-29a78b2c88dc passed on the getEntitySummary call is not known to the open metadata repository SUT_Server

Possibly similar to #2836

popa-raluca commented 3 years ago

Hi @planetf1, I'm not sure why AssetLineage is configured to run with the CTS tests. From AL point of view, this is a valid exception - it fails when trying to build the context for a GlossaryTerm. When building the context it gets all the semantic assignments and adds the context for each column involved. In the tests, it fails because it's not able to find a column in the repository. I wouldn't change the exception handling in AssetLineage OMAS, since for us is important to identify the cases where the context can not be built.

planetf1 commented 3 years ago

Ok so the scenario was an unintended one. Though I said I was running CTS, I had - incorrectly - tested the coco pharma notebooks before CTS (leading to failures there). So user error in terms of my original premis in opening up the issue.

I ran the notebooks alone and don't see any problems (asset Lineage OMAS is enabled) -- nor with the cTS alone (Asset Lineage OMAS is not enabled).

So what was happening, by mistake, is that we had a cohort with multiple servers (7 or 8) across 4 platforms running a stressful test (it runs at high cpu for a few hours). Of course some intentional errors are introduced too, in terms of invalid types etc.

The interesting question is whether asset lineage behaved correctly. I agree with the exception being logged in general, but are these all valid .. or do we perhaps have any loading/timing issue especially in a cohort (something that has been seen on occasion before whereby if the cataloguing data notebook is run, sometimes the asset isn't immediately available on other cohort members, and returns an error, but a few seconds later it is)... Could there be an issue here? I don't know the ALS well enough to comment.....

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.