Open yu-iskw opened 2 years ago
@OnkarVO7 I would love to take the issue. I don't completely understand steps to implement a new connector. Can you tell me if or not I am correct on the subsequent steps? From my understanding, we have to take the two steps. My another question is, can we use a metadata ingest
command after implementing the step two?
@yu-iskw Yes you are on the correct path You can reference this database connector PR for the other necessary file changes that are required.
@OnkarVO7 Thank you for sharing the demo PR. I am going to look into it.
@OnkarVO7 I have a question about FQN underneath OpenMetadata. As far as I know as, basically FQN in OM is composed with database, schema and table.
Take BigQuery , for instance.
Meanwhile, a GCP project can have multiple Spanner instances. A Spanner instance can have multiple databases. A Spanner database can have multiple table. So, how should we compose a FQN of a Spanner table? Specifically, I am wondering how we can manage information about GCP prject IDs for Spanner.
@yu-iskw for this case, below can be ingested as you are mentioning
For the GCP project you can create a tag in OM with project_id
and attach that tag to the database. Let me know if that makes sense
cc: @pmbrull @harshach
@OnkarVO7 I got it. Let me make sure another option about that. At the moment, GCSCredentials
accepts either of single GCP project or multiple GCP project. If we suppress the attributed with only single GCP project, we can also take advantage of a service name in OM. What do you think?
GCSCredentials
is prohibited)@yu-iskw that might not be recommended here. Main goal of the service_name
or services
in OM is so that users can enter any name by themselves and add multiple services into OM of the same source with configurations as they please.
Due to this we never edit the service_name
on the ingestion side.
If we add a logic for GCP project id => Service name
the above goal will be invalidated
@OnkarVO7 I understand. Thanks!
@OnkarVO7 I am wondering how we can create an engine of sqlalchemy, because python-spanner-sqlalchemy
requires all information of a project ID, an instance ID and a database ID as spanner+spanner:///projects/project-id/instances/instance-id/databases/database-id
. So, if we want to ingest metadata of multiple instances and databases in them, we have to dynamically change the sqlalchemy engine. Of course, we can collect information about instances in a GCP project and databases in a Spanner instance. However, I don't know a good way to dynamically pass such information to get_connection
, because basically get_connection
receives only a connection object like BigQueryConnection
. So, we have to dynamically change any information about an instance and a database of Spanner in a connection object. That can get a bit hacky. I would like to know how to properly deal with multiple connections in a connector.
@OnkarVO7 I made a pull request for the feature. I am still wondering how we should implement it. Let's discuss on the pull request.
Is your feature request related to a problem? Please describe. Since Cloud Spanner doesn't allow us to manage both of table-level and column level metadata as descriptions and labels, we have to manage them outside of Spanner.
Describe the solution you'd like It would be great to support a new connector for Googlt Cloud Spanner.
Describe alternatives you've considered I have no options on OpenMetadata.
Additional context NA