odpi / egeria-connector-hadoop-ecosystem

Hadoop ecosystem connectors for Egeria: repository proxy connector for Apache Atlas.
Apache License 2.0
21 stars 14 forks source link

Can't start the connector using the documentation #555

Closed rubtobar closed 1 year ago

rubtobar commented 1 year ago

Hi, i'm trying to set up an egeria instance and connect to it using the hadoop connector.

I'm using the documentation in: https://odpi.github.io/egeria-connector-hadoop-ecosystem/getting-started/index.html#/7/1

Currently i have deployed an apache atlas instance using docker:

docker run -p 9026:9026 -p 9027:9027 -p 21000:21000 docker.io/planetf1/apache-atlas:latest

An apache kafka instance using:

bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties

And i followed the documentation except the part that is using the truststore.p12

The fact that i use http instead of https is the only difference.

But when i launch the server using:

curl -k -X POST "http://localhost:8080/open-metadata/admin-services/users/admin/servers/atlas/instance"

I always get this error: Desktop → curl -k -X POST "http://localhost:8080/open-metadata/admin-services/users/admin/servers/atlas/instance" {"class":"SuccessMessageResponse","relatedHTTPCode":500,"exceptionClassName":"org.odpi.openmetadata.adminservices.ffdc.exception.OMAGConfigurationErrorException","exceptionCausedBy":"org.odpi.openmetadata.repositoryservices.ffdc.exception.OMRSConfigErrorException","actionDescription":"activateWithSuppliedConfig","exceptionErrorMessage":"OMAG-ADMIN-500-001 Method activateWithSuppliedConfig for OMAG server atlas returned an unexpected exception of org.odpi.openmetadata.repositoryservices.ffdc.exception.OMRSConfigErrorException with message OMRS-CONNECTOR-400-005 The connector to the local repository failed with a org.odpi.openmetadata.repositoryservices.ffdc.exception.OMRSLogicErrorException exception and the following error message: OMRS-REPOSITORY-400-025 Local metadata repository has not initialized correctly because it was unable to create its metadata collection","exceptionErrorMessageId":"OMAG-ADMIN-500-001","exceptionErrorMessageParameters":["atlas","activateWithSuppliedConfig","org.odpi.openmetadata.repositoryservices.ffdc.exception.OMRSConfigErrorException","OMRS-CONNECTOR-400-005 The connector to the local repository failed with a org.odpi.openmetadata.repositoryservices.ffdc.exception.OMRSLogicErrorException exception and the following error message: OMRS-REPOSITORY-400-025 Local metadata repository has not initialized correctly because it was unable to create its metadata collection"],"exceptionSystemAction":"The system is unable to work with the OMAG server. No change was made to the server's configuration document.","exceptionUserAction":"This is likely to be either a configuration, operational or logic error. Look for other errors. Validate the request. If you are stuck, raise an issue."}

I dont underestad what means Local metadata repository has not initialized correctly because it was unable to create its metadata collection

The other configuration posts calls does not throw any error, just a {"class":"VoidResponse","relatedHTTPCode":200} as the documentation says.

Could someone give me some hint about what is happening? I'm really stuck at this point.

Thank you very much in advance!

mandy-chessell commented 1 year ago

The REST API call that is failing is starting up the repository proxy server called "atlas" on the OMAG Server Platform. This call interprets the configuration document created by the previous administration calls and attempts to start up the services requested. The repository services failed to start the atlas repository connector. This acts as a client to Apache Atlas, translating Egeria requests into calls to Apache Atlas's APIs. The error message that you see is saying that this connector is unable to initialize correctly.

To find out more, you need to look at the audit log messages that appear in the console of the platform (stdout). This should show the exception and stack trace coming out of the repository connector.

mandy-chessell commented 1 year ago

The REST API call that defines the configuration for the atlas connector is shown on this page: https://odpi.github.io/egeria-connector-hadoop-ecosystem/getting-started/index.html#/6/3/1. However it is a little misleading. The curl command is accurate, but the display of the request body at the bottom of the page does not include the endpoint nor userId and password:

{
    "class":"Connection",
    "connectorType": {
             "class":"ConnectorType",
"connectorProviderClassName":"org.odpi.egeria.connectors.apache.atlas.repositoryconnector.ApacheAtlasOMRSRepositoryConnectorProvider"
     },
    "endpoint":
    {
        "class":"Endpoint",
        "address":"http://localhost:21000",    <=== put correct URL for Apache Atlas here
       "protocol":"http"
    },
       "userId":"admin",
       "clearPassword":"admin"
     }
rubtobar commented 1 year ago

Hi, Thanks for the help.

I was able to make it work, but I don't fully underestand why it does not work using this configuration.

As the documentation says in the page you sent me: "The URL to which we post indicates that we will use the Egeria server chassis's built-in repository proxy capability to access the Apache Atlas repository connector."

https://localhost:9443/open-metadata/admin-services/users/admin/servers/atlas/local-repository/mode/repository-proxy/connection

To make it work i used this:

curl -k -X POST "http://localhost:8080/open-metadata/admin-services/users/admin/servers/atlas/local-repository/mode/in-memory-repository"

I underestand that this uses the "local-repository" wich the documentation says that is just for testing.

Doing this change i'm not really using the Apache Atlas connector? Or the Local-repository is like an internal database where to save the data?

Sorry if the question is to naive, i'm trying to figure it out using the documentation that i have.

mandy-chessell commented 1 year ago

@rubtobar as you guessed, the last command you issued changed the server from a repository proxy server that runs the atlas connector to a metadata access store that runs the in-memory repository - which I do not think is what you want.

Egeria is an integration technology. It has a platform on which you can configure and run many different servers. Egeria supports different types of servers. Each type of server focuses on integrating particular types of technology. Apache Atlas is a metadata repository and so it has a repository connector that runs in a type of server called a repository proxy. The instructions you were following explain how to set up a repository proxy server on your platform.

https://egeria-project.org/concepts/repository-proxy/

Its local repository is Apache Atlas and it connects to other servers via a cohort.

https://egeria-project.org/features/cohort-operation/overview

The repositories in the other servers are referred to as "remote repositories".

So if we go back to your original problem then we need to understand what is wrong with the set up of the atlas repository connector. This will probably be described in the audit log console message output by Egeria's platform. If you share these messages - and the curl command used to configure the atlas connector then I might be able to spot what is wrong

rubtobar-telefonica commented 1 year ago

Hi, thanks for the explanation.

Here is my log when i try to start the egeria server platform.

egeria-platform.log

The Apache Atlas connector is in the same directory as the server platform .jar

And the commands i used to configure the server are:

To start the server:

jdk-22/bin/./java -Dstrict.ssl=false -Dloader.path=. -Dserver.port=9456 -jar server-chassis-spring-4.2-20230529.083359-1.jar

To configure the server:

curl -k -X POST -H "Content-Type: application/json" --data '{"producer":{"bootstrap.servers":"my-kafka-url:9093", "sasl.jaas.config": "com.sun.security.auth.module.Krb5LoginModule required useTicketCache=true;", "sasl.mechanism": "GSSAPI", "security.protocol": "SASL_SSL", "sasl.kerberos.service.name":"kafka"},"consumer":{"bootstrap.servers":"my-kafka-url:9093", "sasl.jaas.config": "com.sun.security.auth.module.Krb5LoginModule required useTicketCache=true;", "sasl.mechanism": "GSSAPI", "security.protocol": "SASL_SSL", "sasl.kerberos.service.name":"kafka"} }' "http://localhost:9456/open-metadata/admin-services/users/admin/servers/atlas/event-bus?connectorProvider=org.odpi.openmetadata.adapters.eventbus.topic.kafka.KafkaOpenMetadataTopicProvider&topicURLRoot=egeria"

curl -k -X POST "http://localhost:9456/open-metadata/admin-services/users/admin/servers/atlas/cohorts/mycohort"

curl -k -X POST -H "Content-Type: application/json" --data '{"class":"Connection","connectorType":{"class":"ConnectorType","connectorProviderClassName":"org.odpi.egeria.connectors.apache.atlas.repositoryconnector.ApacheAtlasOMRSRepositoryConnectorProvider"},"endpoint":{"class":"Endpoint","address":"localhost:31443","protocol":"https"},"userId":"admin","clearPassword":"mypassword"}' "http://localhost:9456/open-metadata/admin-services/users/admin/servers/atlas/local-repository/mode/repository-proxy/connection"

Start the connector

curl -k -X POST "http://localhost:9456/open-metadata/admin-services/users/admin/servers/atlas/instance"

As you can see the Apache Atlas server is in the same server on port 31443. There's nothing in between (no firewalls or anithing). And the Kafka cluster i'm using uses kerberos to log-in. This is another thing that is giving me problems, but this error is shown even using a minimal Kafka instance deployed in the same machine for testing, with no security enabled. So i think the problem is between egeria and the apache atlas server.

Thanks your help :D

rubtobar-telefonica commented 1 year ago

And that is the response i get from the server when executing the initialization command:


curl -k -X POST "http://localhost:9456/open-metadata/admin-services/users/admin/servers/atlas/instance"
{"class":"SuccessMessageResponse","relatedHTTPCode":500,"exceptionClassName":"org.odpi.openmetadata.adminservices.ffdc.exception.OMAGConfigurationErrorException","exceptionCausedBy":"org.odpi.openmetadata.repositoryservices.ffdc.exception.OMRSConfigErrorException","actionDescription":"activateWithSuppliedConfig","exceptionErrorMessage":"OMAG-ADMIN-500-001 Method activateWithSuppliedConfig for OMAG server atlas returned an unexpected exception of org.odpi.openmetadata.repositoryservices.ffdc.exception.OMRSConfigErrorException with message OMRS-CONNECTOR-400-005 The connector to the local repository failed with a org.odpi.openmetadata.repositoryservices.ffdc.exception.OMRSLogicErrorException exception and the following error message: OMRS-REPOSITORY-400-025 Local metadata repository has not initialized correctly because it was unable to create its metadata collection","exceptionErrorMessageId":"OMAG-ADMIN-500-001","exceptionErrorMessageParameters":["atlas","activateWithSuppliedConfig","org.odpi.openmetadata.repositoryservices.ffdc.exception.OMRSConfigErrorException","OMRS-CONNECTOR-400-005 The connector to the local repository failed with a org.odpi.openmetadata.repositoryservices.ffdc.exception.OMRSLogicErrorException exception and the following error message: OMRS-REPOSITORY-400-025 Local metadata repository has not initialized correctly because it was unable to create its metadata collection"],"exceptionSystemAction":"The system is unable to work with the OMAG server.  No change was made to the server's configuration document.","exceptionUserAction":"This is likely to be either a configuration, operational or logic error. Look for other errors.  Validate the request.  If you are stuck, raise an issue."}
mandy-chessell commented 1 year ago

The logs says that the atlas client can not find a class ...

Caused by: java.lang.NoClassDefFoundError: org/apache/commons/configuration/PropertiesConfiguration
    at java.base/java.lang.ClassLoader.defineClass1(Native Method)
    at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1018)
    at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
    at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:524)
    at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:427)
    at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:421)
    at java.base/java.security.AccessController.doPrivileged(AccessController.java:714)
    at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:420)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:593)
    at org.springframework.boot.loader.LaunchedURLClassLoader.loadClass(LaunchedURLClassLoader.java:151)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526)
    at org.apache.atlas.AtlasBaseClient.getClientProperties(AtlasBaseClient.java:335)
    at org.apache.atlas.AtlasBaseClient.initializeState(AtlasBaseClient.java:448)
    at org.apache.atlas.AtlasBaseClient.<init>(AtlasBaseClient.java:131)
    at org.apache.atlas.AtlasClientV2.<init>(AtlasClientV2.java:138)

It looks like you need to add the apache commons configuration library (https://commons.apache.org/proper/commons-configuration/userguide/overview.html) to Egeria's libs in the docker build.

Also ... I had a look at the documentation that you are using and I agree it is misleading in terms of terminology. We are in the process of consolidating our documentation on a new site (https://egeria-project.org/) and this got missed out ...

dwolfson commented 1 year ago

Hi @rubtobar,

I can look at updating the container with the missing library. Are you on our slack channel? That is often a faster way to communicate...I'd like to understand a bit more about the environment you are trying to deploy this in so that I can better help you.. Are you just using docker or are you deploying in Kubernetes? If it is in Kubernetes then you could also load missing jar files as the containers are deployed...

dan

rubtobar-telefonica commented 1 year ago

Hi @dwolfson I'm currently deploying the connector in a linux server directly. I'm currently launching the service using the commands that the documentation shows.

If the Docker deployment is the recomended i will try to do it that way.

In the enviroment we currently have bare metal machines, i'm launching the connector in the same machine where atlas is deployed. We also have a Kafka cluster in a multiple of this bare melat machines, and i whould like to use this production kafka instance to comunicate the cohort.

Maybe i'm missing some JAR files as @mandy-chessell is pointing out that the service can't find some libraries.

I will try to use the dockerfile as a referece to see what librearies i'm missiong out. If i can't find it i will use the Docker image to deploy egeria in the machine.

Rubén

mandy-chessell commented 1 year ago

Hello @rubtobar, @dwolfson and I tried to get this connector up and running last week and hit issues too. The Atlas client was missing a class that we have not been able to track down.

This connector has not been used for a couple of years and I am not surprised it is not working. In that time we have upgraded Java twice (now on 17). Atlas has also had a couple of releases.

Because of a limitation in Atlas, this repository connector is only able to retrieve metadata from Atlas. It can not store new open metadata in Atlas. This limitiation has understandably reduced its usefulness which is why we stopped investing in it.

About a month ago, we started to create a new connector for Atlas to replace this one. The new connector is an integration connector. (See https://egeria-project.org/guides/developer/integration-connectors/overview/). When finished, the integration connector will be able to perform a 2-way exchange of metadata in and out of Atlas. We are also hoping to call the Atlas REST API directly to eliminate the Atlas client. (The Atlas client is based on java 8 and pulls in a lot of Hadoop libraries.) This will simplfy its set up.

So, which is better for you? My preference would be to accelerate the work on the new Atlas connector and try to get it finished for R4.2 (branching 17th July, shipped end of July). Would this fit in your schedule? What are your overall goals?

The alternative is that we could focus on getting this connector working for 4.2 and work on its replacement in 4.3.

rubtobar-telefonica commented 1 year ago

Hi @mandy-chessell @dwolfson,

I do agree that a new Atlas Connector would be the best option.

Our goal by now is to sync the metadata from Atlas into a IBM Watson Knowledge Catalog. In the last releases, IBM WKC has added Egeria compatibility with their product. https://www.ibm.com/docs/en/cloud-paks/cp-data/4.6.x?topic=administering-configuring-synchronization-external-repositories

mandy-chessell commented 1 year ago

Excellent :) So what types of metadata are most important to you? And what levels of control do you need?

I am currently working on is to synchronize glossaries between Atlas and the repositories (like WKC) connected to the open metadata ecosystem. It is possible to set up the connector to synchronize all glossaries or specific named glossaries.

Next I was going to look at bringing Hive tables from Atlas into the open metadata ecosystem. This would include their tags (classifications); connection information and links to glossary terms. Is this useful?

What else? and do you care about Atlas lineage?

rubtobar-telefonica commented 1 year ago

Hi @mandy-chessell,

Yes, bringing all the hive tables to the ecosistem is very usefull for us.

I'm going to prepare a list of thigs to be more specific about the data that we would like to add to the open metadata ecosistem.

Lineaje is important for us, yes. We are using the MANTA integrated softare in the WKC to manage lineage in the platform.

As soon as i got the list i'll post it in here if that would help :D

mandy-chessell commented 1 year ago

That would be brilliant - thank you :)

rubtobar-telefonica commented 1 year ago

Hi @mandy-chessell,

Currently we are using: Hive Impala Nifi HDFS Hbase Spark Ozone Avro Kafka

Whould it be usefull to give you our Atlas type system graph? It contains more concrete objects of every one of this types.

Thanks!

rubtobar-telefonica commented 1 year ago

hi @mandy-chessell ,

there's somenthing we could test or do right now?

theres some slack channel to keep in contact about this?

thanks! Rubén

mandy-chessell commented 1 year ago

Hello @rubtobar, I added an update to the Apache Atlas integration connector yesterday. This is for the 4.2 release we are working on this week. Would you like a call to see it demonstrated and to show you how to set it up? How about Thursday afternoon?

Are you on our slack - I have just sent you an invite. There are a number of egeria channels. For example, #egeria-announce for new features and #egeria-discussion for questions.

rubtobar-telefonica commented 1 year ago

Hi @mandy-chessell,

I have joined the slack grup. Thanks for the invite.

I'm in CEST time.

Thursday between 14:00-16:00 CEST would be a good fit if it suits you.

We are in contact in Slack too.

Thanks!

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.