Open GoogleCodeExporter opened 9 years ago
Clarification: Note that, if the replicator has received at least one event
before going offline, it picks up replication just fine.
1. install a replicator in --direct mode
2. create a table in the master
3. check that the event was received
4. put the replicator offline
5. create another table in the master
6. put the replicator back online
7. check the events: both tables are in the slave
Only if you skip steps 2 and 3 the online operation fails.
Original comment by g.maxia
on 29 Jun 2011 at 9:36
Original comment by berkeley...@gmail.com
on 30 Jun 2011 at 7:59
Sorry for the incorrect verification.
It is not fixed yet.
$ trepctl offline
$ mysql -h $MASTER -e 'create table test.t1(i int)'
$ trepctl online
$ thl -service Castor list
2011-06-30 11:35:57,798 INFO thl.log.DiskLog Using directory
'/home/tungsten/newinst/thl/Castor/' for replicator logs
2011-06-30 11:35:57,799 INFO thl.log.DiskLog Checksums enabled for log
records: true
2011-06-30 11:35:57,799 INFO thl.log.DiskLog Using read-only log connection
2011-06-30 11:35:57,803 INFO thl.log.DiskLog Loaded event serializer class:
com.continuent.tungsten.replicator.thl.serializer.ProtobufSerializer
2011-06-30 11:35:57,804 INFO thl.log.LogIndex Building file index on log
directory: /home/tungsten/newinst/thl/Castor
2011-06-30 11:35:57,812 INFO thl.log.LogIndex Constructed index; total log
files added=1
2011-06-30 11:35:57,812 INFO thl.log.DiskLog Validating last log file:
/home/tungsten/newinst/thl/Castor/thl.data.0000000001
2011-06-30 11:35:57,812 INFO thl.log.DiskLog Setting up log flush policy:
fsyncIntervalMillis=0 fsyncOnFlush=false
2011-06-30 11:35:57,813 INFO thl.log.DiskLog Idle log connection timeout:
28800000ms
2011-06-30 11:35:57,813 INFO thl.log.DiskLog Log preparation is complete
2011-06-30 11:35:57,815 ERROR replicator.thl.THLManagerCtrl Unable to find
sequence number: -1
Original comment by g.maxia
on 30 Jun 2011 at 9:44
This issue was closed by revision r269.
Original comment by berkeley...@gmail.com
on 1 Jul 2011 at 6:33
Sorry. Not fixed yet.
Case 1.
* replicator offline with empty THL
* no changes from master in this period
* replicator goes online
* new changes from master reach the slave.
* case OK.
Case 2.
* replicator offline with empty THL
* master produces changes while replicator is offline
* replicator goes online
* the THL is still empty (changes made while the replicator was offline did not
get in)
* case FAIL
Tested using build 152
DATE: Fri Jul 1 07:19:11 UTC 2011
RELEASE: tungsten-replicator-2.0.4-152
USER ACCOUNT: hudson
BUILD_NUMBER: 152
BUILD_ID: 152
JOB_HAME: Build Replicator Branch-2.0 Google
BUILD_TAG: hudson-Build Replicator Branch-2.0 Google-152
HUDSON_URL: http://cc.aws.continuent.com/
SVN_REVISION: 268
HOST: ip-10-251-90-63
SVN URLs:
https://tungsten-replicator.googlecode.com/svn/trunk/commons
https://tungsten-replicator.googlecode.com/svn/trunk/fsm
https://tungsten-replicator.googlecode.com/svn/trunk/replicator
https://tungsten-replicator.googlecode.com/svn/trunk/replicator-extra
https://bristlecone.svn.sourceforge.net/svnroot/bristlecone/trunk/bristlecone
SVN Revisions:
commons: Revision: 269
fsm: Revision: 269
replicator: Revision: 269
replicator-extra: Revision: 269
bristlecone: Revision: 105
Original comment by g.maxia
on 1 Jul 2011 at 7:35
Here's the root cause of this problem. When the log is empty we start at the
current position on the master. Normally when operating a master we put a
heartbeat event into the server so that something is written into the log.
This is used for failover and works fine for master/slave topologies.
However, in direct mode there are two problems:
1.) There is confusion in the code about which DBMS should get the heartbeat.
This is because we have two DBMS's and the heartbeat command unfortunately
picks the slave. That's a FAIL.
2.) Adding insult to injury, we never even call the heartbeat command in the
first place. It is only called when starting a pipeline that is in the master
role.
One possible solution is for extractors to insert the heartbeat as they know
where it goes. However that has problems--we should only do this if we have an
extractor that is really reading from a database. So it is a conundrum that
requires a little thought to avoid creating another mess.
Original comment by berkeley...@gmail.com
on 28 Jul 2011 at 1:32
Regarding comment #6, please notice that we can't insert heartbeat events in
--direct mode, as there is no master.
A workaround that I use to overcome this issue is adding manually an event to
the master ("DROP TABLE IF EXISTS mysql.non_existing_table").
Can we do something similar?
Original comment by g.maxia
on 28 Jul 2011 at 3:46
A possible workaround:
Tungsten can create a DUD event that will be inserted into the THL when the
service is created.
For example:
DROP TABLE IF EXISTS mysql.dummy_workaround_for_issue_136
Original comment by g.maxia
on 8 Aug 2011 at 8:42
Original comment by berkeley...@gmail.com
on 8 Sep 2011 at 5:10
Original comment by robert.h...@continuent.com
on 23 Jan 2012 at 6:50
Original comment by robert.h...@continuent.com
on 1 Mar 2012 at 9:45
Original comment by robert.h...@continuent.com
on 1 Mar 2012 at 9:46
Original comment by robert.h...@continuent.com
on 20 Sep 2012 at 5:04
This needs a long-term fixed. We are removing it from a scheduled version at
this point until we get time to do a refit of the modeling used in replicator
pipelines.
Original comment by robert.h...@continuent.com
on 15 Jan 2013 at 4:54
Hi,
I'm not exactly sure that i have this issue. But when i got alert that tungsten
has broke. I checked status and found that there is some issue with processing
log file "Unable to prepare plugin: class
name=com.continuent.tungsten.replicator.thl.THL message=[Found invalid log file
header; log must be purged up to this file to open:
/opt/installs/cookbook/thl/db1/thl.data.0000001060]"
ls -l /opt/installs/cookbook/thl/db1/thl.data.0000001060
-rw-r----- 1 root adm 0 Jun 22 06:40
/opt/installs/cookbook/thl/db1/thl.data.0000001060
It seems there we some empty log files generated from thl.data.0000001060 to
thl.data.0000001643. I can restore it by deleting empty one, But would like to
know what loss will be there if i delete them and also would like to know the
cause of generating some many logs.
My Setup:
3 Nodes : Node1, Node2, Node3 in multi-master replication within subnet,
usually i'll have appliedlatency below 1.
I would like to hear you're comments for below points.
1) Can i delete 0 byte file (Empty logs) and restore Tungsten?
2) What is it caused to generate so many empty log files?
Thanks,
Swaroop.
Original comment by swaroopk...@gmail.com
on 24 Jun 2014 at 7:23
Original issue reported on code.google.com by
g.maxia
on 29 Jun 2011 at 9:31