linkedin / databus

Source-agnostic distributed change data capture system
Apache License 2.0
3.64k stars 734 forks source link

open replicator not able to parse timestamp in mysql 5.6 #25

Closed chandanbansal closed 9 years ago

chandanbansal commented 10 years ago

open replicator is working fine with mysql 5.5 But not able to parse Timestamp in 5.6. As there are some changes in bin log for timestamp in 5.6. Is there any plan to update open replicator.

phanindraganti commented 10 years ago

Hi Chandan,

           This looks like more in the OR library.

Is the version of MySQL 5.6.4 or higher ? Can you please check if it has a fractional part http://dev.mysql.com/doc/refman/5.6/en/datetime.html ?

            You may consider submitting a patch to OpenReplicator

https://code.google.com/p/open-replicator/

Thanks Phani

On Thu, Apr 3, 2014 at 8:34 AM, Chandan Bansal notifications@github.comwrote:

open replicator is working fine with mysql 5.5 But not able to parse Timestamp in 5.6. As there are some changes in bin log for timestamp in 5.6. Is there any plan to update that.

Reply to this email directly or view it on GitHubhttps://github.com/linkedin/databus/issues/25 .

chandanbansal commented 10 years ago

Thanks for the reply Phanindra, Yes, i am using 5.6.9 on my local machine. will update OpenReplicator and as per Mysql changes and will submit the patch.

agapple commented 10 years ago

canal is working fine with mysql 5.5 & mysql5.6 & mariadb 5/10.

canal is a opensource by alibaba corp , a simplified version of databus.
canal : https://github.com/alibaba/canal

chandanbansal commented 10 years ago

we already fixed OpenReplicator for same and release the patch ASAP. will also look into canal & otter

phanindraganti commented 10 years ago

Thanks for the note. I went through the wiki for Canal ( thanks for adding notes in English as well ).

Agapple / Chandan,

 If possible, please share some thoughts on Canal vs OpenReplicator. Is

there a wiki page for otter ?

Thanks Phani

On Tue, Apr 15, 2014 at 8:33 PM, agapple notifications@github.com wrote:

canal is working fine with mysql 5.5 & mysql5.6 & mariadb 5/10.

canal : https://github.com/alibaba/canal

— Reply to this email directly or view it on GitHubhttps://github.com/linkedin/databus/issues/25#issuecomment-40559708 .

phanindraganti commented 10 years ago

Are you able to share the O/R patch here ?

Thanks Phani

On Wed, Apr 16, 2014 at 12:44 AM, Chandan Bansal notifications@github.comwrote:

we already fixed OpenReplicator for same and release the patch ASAP. will also look into canal & otter

— Reply to this email directly or view it on GitHubhttps://github.com/linkedin/databus/issues/25#issuecomment-40571782 .

agapple commented 10 years ago

otter wiki : https://github.com/alibaba/otter/wiki

what is canal & otter ?

1

canal vs O/R :

  1. O/R just an binlog parse lib, not even deal with encoding .
  2. canal more like a product ,server/client mode, support server HA / client HA / mysql switch master&slave etc.

2014-04-16 23:41 GMT+08:00 Phanindra Ganti notifications@github.com:

Thanks for the note. I went through the wiki for Canal ( thanks for adding notes in English as well ).

Agapple / Chandan,

If possible, please share some thoughts on Canal vs OpenReplicator. Is there a wiki page for otter ?

Thanks Phani

On Tue, Apr 15, 2014 at 8:33 PM, agapple notifications@github.com wrote:

canal is working fine with mysql 5.5 & mysql5.6 & mariadb 5/10.

canal : https://github.com/alibaba/canal

— Reply to this email directly or view it on GitHub< https://github.com/linkedin/databus/issues/25#issuecomment-40559708> .

— Reply to this email directly or view it on GitHubhttps://github.com/linkedin/databus/issues/25#issuecomment-40614545 .

phanindraganti commented 10 years ago

Hi, Agapple !

            Thanks for sharing the links. I took a cursory look at the

code-base, and was able to get some context on parts of binlog processing (dbsync) and client-side event processing ( example, sink directories ). I still have a couple of gaps in my understanding, which I am listing below.

  1. Are there guarantees about timeline consistency ( or any form of eventual consistency ) ? In other words, can we guarantee that a client never misses events when failover happens ?

    The timeline seems to be (serverId, timestampAtServer, start(logOffset ?)) based on LogEvent.java. If that is the case, when failover happens to another node, we are not guaranteed that we will have a monotically increasing timeline for subsequent events on that particular database.

  2. Very interested to learn more about (1) how quick random lookups from within the event buffer (2) How efficient streaming (i.e., contiguous seek from within a buffer ) is

    If I understand correctly, (2) is quick, but (1) may be slow, based on implementation in "store" directory.

  3. Also, otter seems like a library to handle client side functionality - including usecases like ETL. If able to share details about how incremental ingestion from these sources to offline processing (Hadoop?) happens, will be very interesting.
  4. Any chance the presentations at https://github.com/alibaba/otter/wiki/%E7%9B%B8%E5%85%B3ppt%26pdf are accessible in English ? I was able to translate most of the wiki pages into English, and there was a good deal of useful information there. Thanks !

Cheers Phani

On Thu, Apr 17, 2014 at 1:53 AM, agapple notifications@github.com wrote:

otter wiki : https://github.com/alibaba/otter/wiki

what is canal & otter ?

canal vs O/R :

  1. O/R just an binlog parse lib, not even deal with encoding .
  2. canal more like a product ,server/client mode, support server HA / client HA / mysql switch master&slave etc.

2014-04-16 23:41 GMT+08:00 Phanindra Ganti notifications@github.com:

Thanks for the note. I went through the wiki for Canal ( thanks for adding notes in English as well ).

Agapple / Chandan,

If possible, please share some thoughts on Canal vs OpenReplicator. Is there a wiki page for otter ?

Thanks Phani

On Tue, Apr 15, 2014 at 8:33 PM, agapple notifications@github.com wrote:

canal is working fine with mysql 5.5 & mysql5.6 & mariadb 5/10.

canal : https://github.com/alibaba/canal

— Reply to this email directly or view it on GitHub< https://github.com/linkedin/databus/issues/25#issuecomment-40559708> .

— Reply to this email directly or view it on GitHub< https://github.com/linkedin/databus/issues/25#issuecomment-40614545> .

— Reply to this email directly or view it on GitHubhttps://github.com/linkedin/databus/issues/25#issuecomment-40693982 .

agapple commented 10 years ago

hi phanindraganti https://github.com/phanindraganti : my native language is not English, so ppt/wiki is not available at the moment English version.

  1. If the MySQL master crash, slave hasn't fully catch up with master binlog , this situation may lose data (can be considered to optimize semi-sync). In addition, the company DBA has a MySQL master/slave switching tools, will be the first to set the master to read-only, waiting for slave to complete after master binlog, the master/slave data consistency, set the slave can be written, only through the mainmanagement system to the business and canal , this switch is without loss of data.
  2. random search in the design of eventBuffer occurs only at the first time,subsequent get requests was a results based on the nearest search. At presentthe open source version also does not provide a persistent store implementation,has been doing similar to Kafka(a message queue system) WAL, can provide a plurality of subscription.
  3. Otter is mainly to solve the synchronization data across the china<->us, has its own specific scene, synchronization mechanism and MySQL will be different, consider the network factors, will use the batch. One of the biggest characteristics, using a simple and effective way to solve the data consistency of MySQL double activestructure. Before participated in Hadoop, the follow-up to have the opportunity totry

2014-04-19 8:08 GMT+08:00 Phanindra Ganti notifications@github.com:

Hi, Agapple !

Thanks for sharing the links. I took a cursory look at the code-base, and was able to get some context on parts of binlog processing (dbsync) and client-side event processing ( example, sink directories ). I still have a couple of gaps in my understanding, which I am listing below.

  1. Are there guarantees about timeline consistency ( or any form of eventual consistency ) ? In other words, can we guarantee that a client never misses events when failover happens ?

The timeline seems to be (serverId, timestampAtServer, start(logOffset ?)) based on LogEvent.java. If that is the case, when failover happens to another node, we are not guaranteed that we will have a monotically increasing timeline for subsequent events on that particular database.

  1. Very interested to learn more about (1) how quick random lookups from within the event buffer (2) How efficient streaming (i.e., contiguous seek from within a buffer ) is

If I understand correctly, (2) is quick, but (1) may be slow, based on implementation in "store" directory.

  1. Also, otter seems like a library to handle client side functionality - including usecases like ETL. If able to share details about how incremental ingestion from these sources to offline processing (Hadoop?) happens, will be very interesting.
  2. Any chance the presentations at https://github.com/alibaba/otter/wiki/%E7%9B%B8%E5%85%B3ppt%26pdf are accessible in English ? I was able to translate most of the wiki pages into English, and there was a good deal of useful information there. Thanks !

Cheers Phani

On Thu, Apr 17, 2014 at 1:53 AM, agapple notifications@github.com wrote:

otter wiki : https://github.com/alibaba/otter/wiki

what is canal & otter ?

canal vs O/R :

  1. O/R just an binlog parse lib, not even deal with encoding .
  2. canal more like a product ,server/client mode, support server HA / client HA / mysql switch master&slave etc.

2014-04-16 23:41 GMT+08:00 Phanindra Ganti notifications@github.com:

Thanks for the note. I went through the wiki for Canal ( thanks for adding notes in English as well ).

Agapple / Chandan,

If possible, please share some thoughts on Canal vs OpenReplicator. Is there a wiki page for otter ?

Thanks Phani

On Tue, Apr 15, 2014 at 8:33 PM, agapple notifications@github.com wrote:

canal is working fine with mysql 5.5 & mysql5.6 & mariadb 5/10.

canal : https://github.com/alibaba/canal

— Reply to this email directly or view it on GitHub< https://github.com/linkedin/databus/issues/25#issuecomment-40559708> .

— Reply to this email directly or view it on GitHub< https://github.com/linkedin/databus/issues/25#issuecomment-40614545> .

— Reply to this email directly or view it on GitHub< https://github.com/linkedin/databus/issues/25#issuecomment-40693982> .

— Reply to this email directly or view it on GitHubhttps://github.com/linkedin/databus/issues/25#issuecomment-40854832 .

agapple commented 10 years ago

we have paid great attention to LinkedIn databus, we have to solve business are the same. You automatically switch the recent and historical quene is worth using for reference.

2014-04-19 8:08 GMT+08:00 Phanindra Ganti notifications@github.com:

Hi, Agapple !

Thanks for sharing the links. I took a cursory look at the code-base, and was able to get some context on parts of binlog processing (dbsync) and client-side event processing ( example, sink directories ). I still have a couple of gaps in my understanding, which I am listing below.

  1. Are there guarantees about timeline consistency ( or any form of eventual consistency ) ? In other words, can we guarantee that a client never misses events when failover happens ?

The timeline seems to be (serverId, timestampAtServer, start(logOffset ?)) based on LogEvent.java. If that is the case, when failover happens to another node, we are not guaranteed that we will have a monotically increasing timeline for subsequent events on that particular database.

  1. Very interested to learn more about (1) how quick random lookups from within the event buffer (2) How efficient streaming (i.e., contiguous seek from within a buffer ) is

If I understand correctly, (2) is quick, but (1) may be slow, based on implementation in "store" directory.

  1. Also, otter seems like a library to handle client side functionality - including usecases like ETL. If able to share details about how incremental ingestion from these sources to offline processing (Hadoop?) happens, will be very interesting.
  2. Any chance the presentations at https://github.com/alibaba/otter/wiki/%E7%9B%B8%E5%85%B3ppt%26pdf are accessible in English ? I was able to translate most of the wiki pages into English, and there was a good deal of useful information there. Thanks !

Cheers Phani

On Thu, Apr 17, 2014 at 1:53 AM, agapple notifications@github.com wrote:

otter wiki : https://github.com/alibaba/otter/wiki

what is canal & otter ?

canal vs O/R :

  1. O/R just an binlog parse lib, not even deal with encoding .
  2. canal more like a product ,server/client mode, support server HA / client HA / mysql switch master&slave etc.

2014-04-16 23:41 GMT+08:00 Phanindra Ganti notifications@github.com:

Thanks for the note. I went through the wiki for Canal ( thanks for adding notes in English as well ).

Agapple / Chandan,

If possible, please share some thoughts on Canal vs OpenReplicator. Is there a wiki page for otter ?

Thanks Phani

On Tue, Apr 15, 2014 at 8:33 PM, agapple notifications@github.com wrote:

canal is working fine with mysql 5.5 & mysql5.6 & mariadb 5/10.

canal : https://github.com/alibaba/canal

— Reply to this email directly or view it on GitHub< https://github.com/linkedin/databus/issues/25#issuecomment-40559708> .

— Reply to this email directly or view it on GitHub< https://github.com/linkedin/databus/issues/25#issuecomment-40614545> .

— Reply to this email directly or view it on GitHub< https://github.com/linkedin/databus/issues/25#issuecomment-40693982> .

— Reply to this email directly or view it on GitHubhttps://github.com/linkedin/databus/issues/25#issuecomment-40854832 .

phanindraganti commented 10 years ago

Thanks Agapple for sharing your thoughts !

Cheers Phani

On Sat, Apr 19, 2014 at 2:34 AM, agapple notifications@github.com wrote:

hi phanindraganti https://github.com/phanindraganti : my native language is not English, so ppt/wiki is not available at the moment English version.

  1. If the MySQL master crash, slave hasn't fully catch up with master binlog , this situation may lose data (can be considered to optimize semi-sync). In addition, the company DBA has a MySQL master/slave switching tools, will be the first to set the master to read-only, waiting for slave to complete after master binlog, the master/slave data consistency, set the slave can be written, only through the mainmanagement system to the business and canal , this switch is without loss of data.
  2. random search in the design of eventBuffer occurs only at the first time,subsequent get requests was a results based on the nearest search. At presentthe open source version also does not provide a persistent store implementation,has been doing similar to Kafka(a message queue system) WAL, can provide a plurality of subscription.
  3. Otter is mainly to solve the synchronization data across the china<->us, has its own specific scene, synchronization mechanism and MySQL will be different, consider the network factors, will use the batch. One of the biggest characteristics, using a simple and effective way to solve the data consistency of MySQL double activestructure. Before participated in Hadoop, the follow-up to have the opportunity totry

2014-04-19 8:08 GMT+08:00 Phanindra Ganti notifications@github.com:

Hi, Agapple !

Thanks for sharing the links. I took a cursory look at the code-base, and was able to get some context on parts of binlog processing (dbsync) and client-side event processing ( example, sink directories ). I still have a couple of gaps in my understanding, which I am listing below.

  1. Are there guarantees about timeline consistency ( or any form of eventual consistency ) ? In other words, can we guarantee that a client never misses events when failover happens ?

The timeline seems to be (serverId, timestampAtServer, start(logOffset ?)) based on LogEvent.java. If that is the case, when failover happens to another node, we are not guaranteed that we will have a monotically increasing timeline for subsequent events on that particular database.

  1. Very interested to learn more about (1) how quick random lookups from within the event buffer (2) How efficient streaming (i.e., contiguous seek from within a buffer ) is

If I understand correctly, (2) is quick, but (1) may be slow, based on implementation in "store" directory.

3. Also, otter seems like a library to handle client side functionality

including usecases like ETL. If able to share details about how incremental ingestion from these sources to offline processing (Hadoop?) happens, will be very interesting.

  1. Any chance the presentations at https://github.com/alibaba/otter/wiki/%E7%9B%B8%E5%85%B3ppt%26pdf are accessible in English ? I was able to translate most of the wiki pages into English, and there was a good deal of useful information there. Thanks !

Cheers Phani

On Thu, Apr 17, 2014 at 1:53 AM, agapple notifications@github.com wrote:

otter wiki : https://github.com/alibaba/otter/wiki

what is canal & otter ?

canal vs O/R :

  1. O/R just an binlog parse lib, not even deal with encoding .
  2. canal more like a product ,server/client mode, support server HA / client HA / mysql switch master&slave etc.

2014-04-16 23:41 GMT+08:00 Phanindra Ganti notifications@github.com:

Thanks for the note. I went through the wiki for Canal ( thanks for adding notes in English as well ).

Agapple / Chandan,

If possible, please share some thoughts on Canal vs OpenReplicator. Is there a wiki page for otter ?

Thanks Phani

On Tue, Apr 15, 2014 at 8:33 PM, agapple notifications@github.com wrote:

canal is working fine with mysql 5.5 & mysql5.6 & mariadb 5/10.

canal : https://github.com/alibaba/canal

— Reply to this email directly or view it on GitHub< https://github.com/linkedin/databus/issues/25#issuecomment-40559708>

.

— Reply to this email directly or view it on GitHub< https://github.com/linkedin/databus/issues/25#issuecomment-40614545> .

— Reply to this email directly or view it on GitHub< https://github.com/linkedin/databus/issues/25#issuecomment-40693982> .

— Reply to this email directly or view it on GitHub< https://github.com/linkedin/databus/issues/25#issuecomment-40854832> .

— Reply to this email directly or view it on GitHubhttps://github.com/linkedin/databus/issues/25#issuecomment-40865029 .

jagadeesh-huliyar commented 10 years ago

We have fixed open-replicator to support Mysql version 5.6. Below is the github link. https://github.com/Flipkart/open-replicator.

Support has been added for

  1. Checksum.
  2. datetime2 and time2 datatypes
chandanbansal commented 10 years ago

Really sorry was busy with something didn't got time to look into canal & otter. OR Patch link is already shared by jagadeesh.

phanindraganti commented 10 years ago

Jagadeesh,

 Thanks for the link. There seem to be quite a few files that have been

changed in the branch. Are they are changes over 1.0.5 OopenReplicator ? Or is there just a diff that we can look at ?

--Phani

On Thu, May 1, 2014 at 12:13 AM, jagadeesh-huliyar <notifications@github.com

wrote:

We have fixed open-replicator to support Mysql version 5.6. Below is the github link. https://github.com/Flipkart/open-replicator.

Support has been added for

  1. Checksum.
  2. datetime2 and time2 datatypes

— Reply to this email directly or view it on GitHubhttps://github.com/linkedin/databus/issues/25#issuecomment-41886593 .

chandanbansal commented 10 years ago

you can check diff for 06c87542..61ff163824