pingcap / tiflow

This repo maintains DM (a data migration platform) and TiCDC (change data capture for TiDB)
Apache License 2.0
430 stars 287 forks source link

Support of Pravega's sink requires Go+Rust integration #1740

Open hyffrank opened 3 years ago

hyffrank commented 3 years ago

Feature Request

Is your feature request related to a problem? Please describe: The CDC usecase would like a TiCDC -> Pravega integration similar as integrating with Kafka and Pulsar. The problem here is that Pravega now only provides Java and Rust client. Linking Go with Rust IIF has many issues as well as performance penalties. Having a native Go client from Pravega side is not a short term plan, so we have to figure out other solutions for now.

Describe the feature you'd like: Is it possible to have a Java or Rust integration possibility with Pravega

Describe alternatives you've considered: Could Ticdc provides a workaround so we can integrate with Pravega's Rust or Java client without significant performance penalties.

Teachability, Documentation, Adoption, Migration Strategy: N/A

amyangfei commented 3 years ago

It is a good question about how to integration TiCDC with other programming language or MQ client, we have discussed this problem for several times and a sink plugin issue can be found in #608.

For a long-term goal we will explore a high performance way to implement sink plugin, including go plugin strategies in #608 or wasm or something else to achieve various sink support requirements. However in the short term, we have not decided the roadmap for this feature, and unfortunately we have no workaround for the integration with Pravega client currently.

tisonkun commented 3 years ago

It is a good question about how to integration TiCDC with other programming language or MQ client, we have discussed this problem for several times and a sink plugin issue can be found in #608.

For a long-term goal we will explore a high performance way to implement sink plugin, including go plugin strategies in #608 or wasm or something else to achieve various sink support requirements. However in the short term, we have not decided the roadmap for this feature, and unfortunately we have no workaround for the integration with Pravega client currently.

Go plugin strategy sounds sick because it requires CGO_ENABLED=1 and still Golang only. Also a contrib sink can be contributed back to our codebase and there is no reason we accept a well implemented sink.

For a possible workaround, I'd like to know whether the open protocol supports a pull model or we can find a way to have a thin middleware for adaption.

amyangfei commented 3 years ago

For a possible workaround, I'd like to know whether the open protocol supports a pull model or we can find a way to have a thin middleware for adaption.

Yep pull model can solve a lot of scenarios, to support pull model in TiCDC is difficult in current architecture for two reasons

hyffrank commented 3 years ago

Hi @amyangfei , thanks for the quick response. We'll seek other options and I'll update the status in this thread.

hyffrank commented 3 years ago

It is a good question about how to integration TiCDC with other programming language or MQ client, we have discussed this problem for several times and a sink plugin issue can be found in #608. For a long-term goal we will explore a high performance way to implement sink plugin, including go plugin strategies in #608 or wasm or something else to achieve various sink support requirements. However in the short term, we have not decided the roadmap for this feature, and unfortunately we have no workaround for the integration with Pravega client currently.

Go plugin strategy sounds sick because it requires CGO_ENABLED=1 and still Golang only. Also a contrib sink can be contributed back to our codebase and there is no reason we accept a well implemented sink.

For a possible workaround, I'd like to know whether the open protocol supports a pull model or we can find a way to have a thin middleware for adaption.

I agree with you that FFI cgo is not a good option here, it's requires running a C stack inside a Go stack which also brings a lot of other issues. Since the pull mode is not supported here, I have to seek other possible ways and let you guys know.

sunxiaoguang commented 3 years ago

Maybe building a native Pravega go client is a better choice? It not only helps with TiCDC integration, other applications developed in Golang may benefit from this as well.

hyffrank commented 3 years ago

Maybe building a native Pravega go client is a better choice? It not only helps with TiCDC integration, other applications developed in Golang may benefit from this as well.

I agree with you. This is on Pravega's long term plan as well, but for now we have to find other options before the Golang native client comes out.

hyffrank commented 3 years ago

I found another option that might be a solution:

We can do the source through canal to Pravega. This is feasiable because canal is Java based so we can extend a Pravega Sink Connector in canal. Could anyone confirm that this is a doable solution: TiCDC -> canal -> Pravega -> Flink -> TiDB ?

sunxiaoguang commented 3 years ago

I found another option that might be a solution:

We can do the source through canal to Pravega. This is feasiable because canal is Java based so we can extend a Pravega Sink Connector in canal. Could anyone confirm that this is a doable solution: TiCDC -> canal -> Pravega -> Flink -> TiDB ?

Canal is a codec in TiCDC, sink is the actual implementation of MQ that takes messages. Therefore you can't sink to Pravega by switching to canal codec itself.

hyffrank commented 3 years ago

I found another option that might be a solution: We can do the source through canal to Pravega. This is feasiable because canal is Java based so we can extend a Pravega Sink Connector in canal. Could anyone confirm that this is a doable solution: TiCDC -> canal -> Pravega -> Flink -> TiDB ?

Canal is a codec in TiCDC, sink is the actual implementation of MQ that takes messages. Therefore you can't sink to Pravega by switching to canal codec itself.

Does TiCDC provide "pull" interface for canal to fetch the binlogs from it, following the canal codec?

sunxiaoguang commented 3 years ago

Unfortunately, it doesn't work this way currently. Canal is just a codec format.

tisonkun commented 3 years ago

@hyffrank I can see there is contributor working on TiCDC Java Client at https://internals.tidb.io/t/topic/124. You might be interested to check it out.

hyffrank commented 3 years ago

@tisonkun Based on the follow up comments in https://internals.tidb.io/t/topic/124/5 raised by shanzi, this java client is not an alternative to TiCDC but a consumer libarary of TiKV. As in this particular issue we would like to connect TiCDC directly with Pravaga, having TiCDC sink data to Pravega. As we're starting building a Pravega connect framework, I would see the TiCDC java client a good candidate to be integrated with Pravega connect where it can be used as a source client. Also we can utilize TiFlink directly as the upstream which sink data to Pravega.

Thanks for reminding me.