odpi / egeria-connector-xtdb

Pluggable repository for Egeria, using XTDB (formerly "Crux") as the back-end to natively support historical metadata.
https://odpi.github.io/egeria-docs/connectors/repository/xtdb/
Apache License 2.0
15 stars 7 forks source link

Exception thrown when trying to save classifications for entities not saved in the local repository #390

Closed alexandra-bucur closed 2 years ago

alexandra-bucur commented 2 years ago

I tried to do an initial load from IGC using Data Engine on a platform with the XTDB connector. The lineage was not fully processed because at the moment of saving some classifications (latest changes on anchors) in the local repository I got some exceptions. It looks to me that the connector does not allow storing classifications for entities not stored in the local repo. For example, the initial load is a job level one and database schemas are not saved locally. There is an anchor from relational table to database schema and the failure appears when a latest change classification needs to be put on the database schema. Egeria allows this behavior for a few months.

This is a similar stack trace for a data file that only exists as an entity proxy in the local repository:

Tue Aug 02 15:11:16 EEST 2022 omas-server Exception OMAG-REPOSITORY-HANDLER-0003 An unexpected error org.odpi.openmetadata.repositoryservices.ffdc.exception.EntityNotKnownException was returned to reclassifyEntity(LatestChange) by the metadata server during upsertExternalRelationship request for open metadata access service Data Engine OMAS on server omas-server; message was OMRS-XTDB-REPOSITORY-400-003 The attempt to retrieve an entity with GUID e_data_file@metadataCollection:entityGUID found only an entity proxy in repository null
Tue Aug 02 15:11:16 EEST 2022 omas-server Exception OMAG-REPOSITORY-HANDLER-0003 Supplementary information: log record id bfcf6c86-09ca-4b5f-976c-dd8de399c13e org.odpi.openmetadata.repositoryservices.ffdc.exception.EntityNotKnownException returned message of OMRS-XTDB-REPOSITORY-400-003 The attempt to retrieve an entity with GUID e_data_file@metadataCollection:entityGUID found only an entity proxy in repository null and stacktrace of 
OCFCheckedExceptionBase{reportedHTTPCode=400, reportingClassName='org.odpi.egeria.connectors.juxt.xtdb.txnfn.UpdateEntityClassification', reportingActionDescription=':egeria/updateEntityClassification', reportedErrorMessage='OMRS-XTDB-REPOSITORY-400-003 The attempt to retrieve an entity with GUID e_data_file@metadataCollection:entityGUID found only an entity proxy in repository null', reportedErrorMessageId='OMRS-XTDB-REPOSITORY-400-003', reportedErrorMessageParameters=[e_data_file@metadataCollection:entityGUID, null], reportedSystemAction='The system was unable to perform the entity retrieval.', reportedUserAction='Correct the caller's code to request an entity and retry the request.', reportedCaughtException=null, reportedCaughtExceptionClassName='null', relatedProperties=null}
    at org.odpi.egeria.connectors.juxt.xtdb.txnfn.TxnValidations.nonProxyEntity(TxnValidations.java:155)
    at org.odpi.egeria.connectors.juxt.xtdb.txnfn.UpdateEntityClassification.<init>(UpdateEntityClassification.java:67)
    at clojure.core$eval17526$fn__17527.invoke(NO_SOURCE_FILE:0)
    at clojure.lang.AFn.applyToHelper(AFn.java:178)
    at clojure.lang.AFn.applyTo(AFn.java:144)
    at clojure.core$apply.invokeStatic(core.clj:669)
    at clojure.core$apply.invoke(core.clj:662)
    at xtdb.tx$eval9087$fn__9089.invoke(tx.clj:247)
    at clojure.lang.MultiFn.invoke(MultiFn.java:239)
    at xtdb.tx.InFlightTx$fn__9139.invoke(tx.clj:314)
    at xtdb.tx.InFlightTx.index_tx_events(tx.clj:311)
    at xtdb.tx$__GT_tx_ingester$fn__9328.invoke(tx.clj:507)
    at xtdb.tx.subscribe$tx_handler$fn__15066.invoke(subscribe.clj:38)
    at clojure.lang.PersistentVector.reduce(PersistentVector.java:343)
    at clojure.core$reduce.invokeStatic(core.clj:6885)
    at clojure.core$reduce.invoke(core.clj:6868)
    at xtdb.tx.subscribe.NotifyingSubscriberHandler$fn__15144.invoke(subscribe.clj:98)
    at xtdb.tx.subscribe$completable_thread$fn__15060.invoke(subscribe.clj:19)
    at clojure.lang.AFn.run(AFn.java:22)
    at java.base/java.lang.Thread.run(Thread.java:834)

2022-08-02 15:11:16.866 ERROR 76962 --- [nPool-worker-25] o.o.o.a.d.s.s.DataEngineRESTServices     : Exception while adding lineage mapping LineageMapping(sourceAttribute=dataFileQualifiedName, targetAttribute=_(host)=processQualifiedName : PropertyServerException{reportedHTTPCode=500, reportingClassName='org.odpi.openmetadata.commonservices.repositoryhandler.RepositoryErrorHandler', reportingActionDescription='upsertExternalRelationship', reportedErrorMessage='OMAG-REPOSITORY-HANDLER-500-001 An unexpected error org.odpi.openmetadata.repositoryservices.ffdc.exception.EntityNotKnownException was returned to reclassifyEntity(LatestChange) by the metadata server during upsertExternalRelationship request for open metadata access service Data Engine OMAS on server omas-server; message was OMRS-XTDB-REPOSITORY-400-003 The attempt to retrieve an entity with GUID e_data_file@metadataCollection:entityGUID found only an entity proxy in repository null', reportedErrorMessageId='OMAG-REPOSITORY-HANDLER-500-001', reportedErrorMessageParameters=[OMRS-XTDB-REPOSITORY-400-003 The attempt to retrieve an entity with GUID e_data_file@metadataCollection:entityGUID found only an entity proxy in repository null, upsertExternalRelationship, Data Engine OMAS, omas-server, org.odpi.openmetadata.repositoryservices.ffdc.exception.EntityNotKnownException, reclassifyEntity(LatestChange)], reportedSystemAction='The system is unable to process the request because of an internal error.', reportedUserAction='Verify the sanity of the server.  This is probably a logic error.  If you can not work out what happened, ask the Egeria community for help.', reportedCaughtException=null, reportedCaughtExceptionClassName='null', relatedProperties=null}
2022-08-02 15:11:16.887 ERROR 76962 --- [nPool-worker-25] o.o.o.c.ffdc.RESTExceptionHandler        : Exception from addLineageMappings being packaged for return on REST call

org.odpi.openmetadata.frameworks.connectors.ffdc.PropertyServerException: OMAG-REPOSITORY-HANDLER-500-001 An unexpected error org.odpi.openmetadata.repositoryservices.ffdc.exception.EntityNotKnownException was returned to reclassifyEntity(LatestChange) by the metadata server during upsertExternalRelationship request for open metadata access service Data Engine OMAS on server omas-server; message was OMRS-XTDB-REPOSITORY-400-003 The attempt to retrieve an entity with GUID e_data_file@metadataCollection:entityGUID found only an entity proxy in repository null
    at org.odpi.openmetadata.commonservices.repositoryhandler.RepositoryErrorHandler.handleRepositoryError(RepositoryErrorHandler.java:759) ~[classes/:na]
    at org.odpi.openmetadata.commonservices.repositoryhandler.RepositoryHandler.reclassifyEntity(RepositoryHandler.java:1895) ~[classes/:na]
    at org.odpi.openmetadata.commonservices.generichandlers.OpenMetadataAPIGenericHandler.addLatestChangeToAnchor(OpenMetadataAPIGenericHandler.java:4304) ~[classes/:na]
    at org.odpi.openmetadata.commonservices.generichandlers.OpenMetadataAPIGenericHandler.linkElementToElement(OpenMetadataAPIGenericHandler.java:12946) ~[classes/:na]
    at org.odpi.openmetadata.commonservices.generichandlers.OpenMetadataAPIGenericHandler.linkElementToElement(OpenMetadataAPIGenericHandler.java:12602) ~[classes/:na]
    at org.odpi.openmetadata.accessservices.dataengine.server.handlers.DataEngineCommonHandler.upsertExternalRelationship(DataEngineCommonHandler.java:207) ~[classes/:na]
    at org.odpi.openmetadata.accessservices.dataengine.server.handlers.DataEngineSchemaTypeHandler.addLineageMappingRelationship(DataEngineSchemaTypeHandler.java:183) ~[classes/:na]
    at org.odpi.openmetadata.accessservices.dataengine.server.service.DataEngineRESTServices.lambda$addLineageMappings$0(DataEngineRESTServices.java:820) ~[classes/:na]
    at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) ~[na:na]
    at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655) ~[na:na]
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[na:na]
    at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290) ~[na:na]
    at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746) ~[na:na]
    at java.base/java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:290) ~[na:na]
    at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java) ~[na:na]
    at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) ~[na:na]
    at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) ~[na:na]
    at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) ~[na:na]
    at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) ~[na:na]
cmgrote commented 2 years ago

Hi @alexandra-bucur β€” looks like I missed a spot to remove the proxy checking for classifications. Have pushed a change that should remove this validation. I'll re-test to ensure no breakages in CTS, but if you want to also re-test locally with the latest snapshot build (https://oss.sonatype.org/content/repositories/snapshots/org/odpi/egeria/egeria-connector-xtdb/) that'd be great.

(I suspect the CTS isn't currently testing this case, so the CTS test will mainly reveal whether the 1-line code change has broken anything else in the CTS.)

cmgrote commented 2 years ago

Fix should also be in the 3.10 release now, if you want to test with that instead. At the very least the error should have changed πŸ™ˆ Do let me know either way, please? πŸ™

alexandra-bucur commented 2 years ago

Hi @cmgrote! Thank you for looking into this. I could not see that exception anymore, but unfortunately, I still have some issues:

Fri Aug 26 11:40:48 EEST 2022 omas-server Exception OMAG-REPOSITORY-HANDLER-0003 Supplementary information: log record id c72e5bb2-5883-49d0-b0ba-e877f9033795 org.odpi.openmetadata.repositoryservices.ffdc.exception.EntityNotKnownException returned message of OMRS-XTDB-REPOSITORY-404-001 The repository does not contain any entity with the GUID e_database_schema@metadataCollection:entityGUID metadataCollection:entityGUID  and stacktrace of 
OCFCheckedExceptionBase{reportedHTTPCode=404, reportingClassName='org.odpi.egeria.connectors.juxt.xtdb.txnfn.ClassifyEntity', reportingActionDescription=':egeria/classifyEntity', reportedErrorMessage='OMRS-XTDB-REPOSITORY-404-001 The repository does not contain any entity with the GUID e_database_schema@metadataCollection:entityGUID ', reportedErrorMessageId='OMRS-XTDB-REPOSITORY-404-001', reportedErrorMessageParameters=[e_database_schema@metadataCollection:entityGUID ], reportedSystemAction='The XTDB repository is unable to find any entity with the provided GUID.', reportedUserAction='Correct the caller's code to ensure the entity being requested is one contained in the local server.', reportedCaughtException=null, reportedCaughtExceptionClassName='null', relatedProperties=null}
    at org.odpi.egeria.connectors.juxt.xtdb.txnfn.ClassifyEntity.<init>(ClassifyEntity.java:73)
    at clojure.core$eval16497$fn__16498.invoke(NO_SOURCE_FILE:0)
    at clojure.lang.AFn.applyToHelper(AFn.java:216)
    at clojure.lang.AFn.applyTo(AFn.java:144)
    at clojure.core$apply.invokeStatic(core.clj:669)
    at clojure.core$apply.invoke(core.clj:662)
    at xtdb.tx$eval9087$fn__9089.invoke(tx.clj:247)
    at clojure.lang.MultiFn.invoke(MultiFn.java:239)
    at xtdb.tx.InFlightTx$fn__9139.invoke(tx.clj:314)
    at xtdb.tx.InFlightTx.index_tx_events(tx.clj:311)
    at xtdb.tx$__GT_tx_ingester$fn__9330.invoke(tx.clj:509)
    at xtdb.tx.subscribe$tx_handler$fn__15068.invoke(subscribe.clj:38)
    at clojure.lang.PersistentVector.reduce(PersistentVector.java:343)
    at clojure.core$reduce.invokeStatic(core.clj:6885)
    at clojure.core$reduce.invoke(core.clj:6868)
    at xtdb.tx.subscribe.NotifyingSubscriberHandler$fn__15146.invoke(subscribe.clj:98)
    at xtdb.tx.subscribe$completable_thread$fn__15062.invoke(subscribe.clj:19)
    at clojure.lang.AFn.run(AFn.java:22)
    at java.base/java.lang.Thread.run(Thread.java:834)

It's still related to the fact that we don't store the DB schemas in this scenario.

cmgrote commented 2 years ago

Ok, I think I understand. Is this an accurate description of the expected behavior?

  1. You're telling the repository to classify an entity.
  2. The repository doesn't (yet) know anything about the entity.
  3. The repository should therefore:
    1. Create a proxy for the entity.
    2. Classify that entity proxy.
  4. The repository should then return the resulting classified entity (ish).

This sounds logical, and now I see it's what seems to be implemented in the graph repository's metadata collection for the (newer) ::classifyEntity method that receives an entity proxy directly.

However, I think this has exposed a bit of an issue with our interface definitions (and by implication our process for maintaining / extending / revising these interfaces):

  1. The defined interface / expectations of this (newer) ::classifyEntity method:

    • It's implemented directly in the base class (OMRSMetadtaaCollection).
    • This makes it trivial for other repositories not to implement it at all (or even know that doing so is needed).
    • There is nothing in the documentation of the method to indicate what to do if the entity proxy it receives does not exist (i.e. that it expects an underlying repository to override this default implementation and actually store the entity proxy). In fact, to me it's even worse because there is an EntityNotKnownException explicitly defined for the method with a description that implies that if the entity proxy does not exist it should not be created.
    • Suggested mitigation: we should add much more detail to the JavaDoc of this newer method indicating its expected behavior for non-existent entity proxies, and that the provided default implementation is in reality only a partial solution. (In particular, emphasize strongly that we expect underlying repository implementations to override this method with additional functionality.)
  2. The (newer) ::classifyEntity expects the (older) method to convert what is actually an EntityProxy into an EntityDetail object.

    • These objects are peers in our object model rather than directly related (they both inherit from EntitySummary directly).
    • So strictly speaking, this is type conversion / coercion and not "normal" object subtype / supertype inheritance.
    • As such, it's something that repositories must explicitly write code to do β€” it's not handled by the objects themselves.
    • Once again, there is nothing in the JavaDocs of the interface to describe this expectation that such coercion code should be implemented.
    • Suggested mitigation: we should add more detail to the JavaDoc of this older method indicating it's expected to do such type coercion in cases where it is only operating on an entity proxy. (And probably further add detail to the JavaDocs that the method should work on both full entities and proxies.)
cmgrote commented 2 years ago

In theory https://github.com/odpi/egeria-connector-xtdb/pull/400 should implement the new behavior (assuming you're using this newer ::classifyEntity method that takes the full EntityProxy object?)

Can you please re-test with the latest snapshot (https://oss.sonatype.org/content/repositories/snapshots/org/odpi/egeria/egeria-connector-xtdb/3.11-SNAPSHOT/egeria-connector-xtdb-3.11-20220827.223512-2-jar-with-dependencies.jar) and confirm either way? πŸ™

alexandra-bucur commented 2 years ago

Hey @cmgrote! Thank you for your work! πŸ™ It now behaves exactly like it does with the default repository. It still runs with some exceptions but they are related to entities already classified with LatestChange. This is subject to another investigation. It also doesn't seem to affect the lineage.