siddhi-io / siddhi

Stream Processing and Complex Event Processing Engine
http://siddhi.io
Apache License 2.0
1.52k stars 528 forks source link

CDC for Teradata #1167

Closed Sanket-Tantia closed 5 years ago

Sanket-Tantia commented 5 years ago

Description:

Can we perform CDC using siddhi app for any RDBMS database which has a JDBC driver like Teradata or are there any restrictions?

Affected Siddhi Version:

OS, DB, other environment details and versions:

Steps to reproduce:

Related Issues:

mohanvive commented 5 years ago

@Sanket-Tantia We haven't tested with DB like Teradata but technically it should work with any RDBMS database which has a JDBC driver.

Sanket-Tantia commented 5 years ago

Okay, thanks.

Sanket-Tantia commented 5 years ago

Hi, While connecting with other RDBMS databases like Oracle, Teradata, I am facing the below mentioned error: java.io.IOException: Required bundle manifest headers do not exist I did not face this issue with MySQL.

I have imported the correct jar files in the lib since I am using them to connect via my Java application. Can you please suggest, on how to fix this issue?

mohanvive commented 5 years ago

Seems like you have added a non-OSGI jar to the lib. Can you please follow the link [1]..

[1] https://docs.wso2.com/display/SP430/Adding+Third+Party+Non+OSGi+Libraries

Sanket-Tantia commented 5 years ago

Yes, that was the case. A small doubt, is wso2 sp using Debezium connector in the backend to preform cdc operations?

If yes, can you suggest an alternative?

LakshanSS commented 5 years ago

Hi @Sanket-Tantia We have 2 modes in CDC. Listening mode and Polling mode. By default, it uses the listening mode. In the listening mode we are using Debezium in the backend. But currently we support only MySQL in the Listening mode.

Polling mode doesn't use Debezium.

You can find how to use polling mode in the following doc. It also contains examples for polling mode. https://wso2-extensions.github.io/siddhi-io-cdc/api/latest/

Sanket-Tantia commented 5 years ago

Hi,

I was trying to configure JDBC data source for the wso2 sp server. I need to set a class path for the JDBC driver which is present in /lib so that it can access a config file present in the same folder.

Please tell me how to do this.

LakshanSS commented 5 years ago

Hi @Sanket-Tantia In the following tutorial guide, you can find instructions to configure a data source. https://docs.wso2.com/display/SP440/Integrating+Datastores

Sanket-Tantia commented 5 years ago

Hi, I have followed the tutorial guide, and have been able to integrate MySQL, but while integrating other RDBMS databases like Oracle and Teradata, they have other dependencies one being the JDBC driver is depended on an external config file, which I am unable to configure.

mohanvive commented 5 years ago

@Sanket-Tantia Can you please provide more context here; then we could help after some home work. What is the DB that you are using? What are the jars? What is the external config file?

Sanket-Tantia commented 5 years ago

Sure, I will provide you more details on what I am trying to achieve.

The db I am trying to connect to is TeraData. TeraData requires two jar files which can be downloaded from the below mentioned link [1] https://downloads.teradata.com/download/connectivity/jdbc-driver

The jar files are terjdbc4 & tdgssconfig. I have tried with versions 15.10.00.37 & 16.10.00.07 & 16.20.00.10. I converted them into OSGI bundles using WSO2 and placed them in lib folder.

I make a connection using this code snippet:

@store(type='rdbms',jdbc.url="jdbc:teradata://hostname/database=test", username="test", password="test", jdbc.driver.name="com.teradata.jdbc.TeraDriver") define table EMPLOYEE(id int, name string);

I get the following error:

Error starting Siddhi App 'app_name', triggering shutdown process. Exception during pool initialization: Could not initialize class com.teradata.tdgss.jtdgss.TdgssManager

The error is caused because it is unable to find the tdgssconfig file. The solution is to define the path of the file in the classpath. I have also tried defining the classpath for my machine but to no avail.

mohanvive commented 5 years ago

Can you please copy the tdgssconfig.jar to /bin directory where startup script is available and try out? (JDBC jar should go to lib folder after converting it as a bundle)

Sanket-Tantia commented 5 years ago

I have tried that, its still not working. And I get the same error response.

Besides that, I am facing one more issue, I was trying to implement cdc on Oracle database using polling mode and its throwing an error. If you can help me with that:

osgi> [2019-04-19 08:48:19,461] ERROR {org.wso2.extension.siddhi.map.keyvalue.sourcemapper.KeyValueSourceMapper} - Message **{name=ABV, id=5}** contains incompatible attribute types and values. Value 5 is not compatible with type INTEGER,Hence dropping the message The cdc source definition is: @source(type = 'cdc', mode='polling', polling.column = 'id', datasource.name = 'WSO2_TEST_DA', table.name = 'WSO', @map(type='keyvalue'), @attributes(id = 'id', name='name')) define stream inputStream (id int, name string);

I guess the response format is not supported, because when I tried cdc with MySQL in listening mode the data format I got was: [2019-04-19 08:38:53,126] INFO {org.wso2.siddhi.core.stream.output.sink.LogSink} - mysql_to_mysql_insert_or_update : logStream : Event{timestamp=1555643333098, **data=[44, me, 12],** isExpired=false}

Sanket-Tantia commented 5 years ago

Hi, were you able to find a solution for either of the two?

Moreover debezium supports Oracle. So are you going to add a listening mode support using logs for Oracle also as you have done for MySQL?

mohanvive commented 5 years ago

@Sanket-Tantia 1) No, we haven't got a chance to check the Tera data issue... 2) Regarding the data type issue in Oracle, this was already fixed. Please remove the /lib/siddhi-map-keyvalue-1.x.x.jar and update with the latest dependency. https://mvnrepository.com/artifact/org.wso2.extension.siddhi.map.keyvalue/siddhi-map-keyvalue/1.1.2 3) Listening mode support for Oracle - It is in our future roadmap. If you wish, you could contribute this improvement to Siddhi as well. :smile:

LakshanSS commented 5 years ago

Hi, Currently, I am working on CDC Listening mode to add support for some other DBs including Oracle. Soon it will be available.

Sanket-Tantia commented 5 years ago

@mohanvive Thanks for the reply. I was able to resolve the data type issue using the above mentioned steps. The Teradata issue was something I was exploring and you can check it out whenever you are free.

@LakshanSS Thanks for the update. Can you share a tentative timeline, when you are planning to release the listening mode for oracle, and are you planning to use the log miner or oracle golden gate approach?

In addition to that, I have few more doubts if you can help.

  1. Can I write a custom query besides using the below mentioned syntax because in this approach I have to first define a datasource from TriggerStream#rdbms:query('WSO2_TEST_DA', "select table_name from dba_tables where table_name = 'WSO'",'table_name string')

  2. Do i have to define each columns in my table in the stream, can we some way fetch all columns directly as some of the tables have hundreds of column.

mohanvive commented 5 years ago

@Sanket-Tantia 1) Syntax of the query function is defined. In that, first argument would be data source because we anyway need a data source to access a DB. What is concern on defining the data source?

2) Tables in Siddhi level is a way of mirroring the database table. It should be possible to have less number of attributes compared to the actual DB table. Please try and let us know if you have any probs.

LakshanSS commented 5 years ago

@Sanket-Tantia Mostly it will be released within 1-2 weeks. Internally we are using debezium for the listening mode. Debezium is currently using the golden gate approach for the oracle connector.

LakshanSS commented 5 years ago

Hi @Sanket-Tantia

  1. I have done implementation for SQLServer and PostgreSQL in Listening Mode.

  2. As I mentioned earlier, Debezium uses the golden gate approach in the current version (0.9.5.Final) which is not license friendly. But I could notice that they are trying to implement Log Miner approach also. It may be available soon. Anyway Debezium Oracle Connector is still in the incubator phase. So they don't recommend it for production usage. Once it is considered mature enough, they will add it into the Debezium main repository. After that we can implement it in siddhi-io-cdc.

mohanvive commented 5 years ago

@Sanket-Tantia Do you need any further help on this issue?

mohanvive commented 5 years ago

Closing the issue since queries are answered. Please reopen if you need any further clarification on this...