The MySQL Extractor isn't able to reuse relay logs that have already been downloaded from the master

GoogleCodeExporter commented 8 years ago

What steps will reproduce the problem?

1. Setup direct replication from a server that allows large binary logs
2. Set the start position for replication at a large binary log byte offset
3. Put the replication service online
4. The replicator will pause while downloading the entire file before it comes 
online
5. Put the replicator offline
6. Put the replicator back online
7. The replicator will pause while it downloads the entire file again

What is the expected output?

The trepctl command should return quickly in the first case but state that the 
service is synchronizing or something. The replicator should come online 
quickly in the second case because the file has already been downloaded and the 
extractor is reading from it.

What do you see instead?

A replicator that appears to be stalled. The `trepctl online` command doesn't 
complete and the `trepctl status` output says it is OFFLINE:NORMAL.

What is the possible cause?

The tungsten.replicator.extractor.mysql.RelayLogClient is downloading the 
entire binary log file again.

What is the proposed solution?

Allow configuration and use of the RelayLogClient.autoClean setting. This 
should be available through a new MySQLExtractor.relayLogAutoClean option. The 
default for that option is 'false'.

When this option is set to true, the MySQLExtractor will configure the 
RelayLogClient with autoClean enabled and the proper extraction offset that is 
available in MySQLExtractor.startRelayLogs.

Inside the RelayLogClient, it will do nothing different if autoClean is false. 
If autoClean is set to true and the identified file already exists, it will 
truncate that file at offset and then start extraction from the master at that 
point.

Additional information

...

Use labels and text to provide additional information.

Original issue reported on code.google.com by jeff.m...@continuent.com on 4 Feb 2014 at 2:11

GoogleCodeExporter commented 8 years ago

Original comment by jeff.m...@continuent.com on 27 Mar 2014 at 9:04

Removed labels: FixedIn-2.2.1

GoogleCodeExporter commented 8 years ago

We don't want to download what has been already extracted.

Overall, not critical any more, but an inconvenience.

Original comment by linas.vi...@continuent.com on 16 May 2014 at 2:15

Added labels: Priority-High, Usability
Removed labels: Priority-Critical

GoogleCodeExporter commented 8 years ago

Another improvement that could be part of this is to make the MySQLExtractor 
more robust in handling disconnects from the MySQL server.  If we lose the 
connection while downloading, the code should: 

1.) Determine the last "safe" position in the relay log based on the last event 
generated.
2.) Clean up the tail of the current binlog to remove any extra bytes. 
3.) Restart downloading from that safe position. 

The extractor will just resynchronize without generating an error, much as 
THLExtractor does now.

Original comment by robert.h...@continuent.com on 16 May 2014 at 2:41

zxs / tungsten-replicator

The MySQL Extractor isn't able to reuse relay logs that have already been downloaded from the master #815