michelya / tungsten-replicator

Automatically exported from code.google.com/p/tungsten-replicator
0 stars 0 forks source link

Some records get lost in multiple master replication #15

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
In a three master setup, when each master creates a different table and sends 
two records into it, one of the records gets lost, i.e. it is not applied, 
without errors or warnings.

The topology is the following:
server alpha: (HOST1)
local master service alpha, remote slave services bravo and charlie

server bravo: (HOST2)
local master service bravo, remote slave services alpha and charlie

server charlie: (HOST3)
local master service charlie, remote slave services bravo and alpha

The commands executed for this test were the following

$MYSQL -h $HOST1 -e 'drop table if exists test.t1'
$MYSQL -h $HOST2 -e 'drop table if exists test.t2'
$MYSQL -h $HOST3 -e 'drop table if exists test.t3'
$MYSQL -h $HOST1 -e 'create table test.t1(i int)'
$MYSQL -h $HOST2 -e 'create table test.t2(i int)'
$MYSQL -h $HOST3 -e 'create table test.t3(i int)'

MAXRECS=2
echo "inserting $MAXRECS records into each of the three masters. Please wait"
for CNT in $(seq 1 $MAXRECS)
do
    $MYSQL -h $HOST1 -e "insert into test.t1 values ($CNT)"
    $MYSQL -h $HOST2 -e "insert into test.t2 values ($CNT)"
    $MYSQL -h $HOST3 -e "insert into test.t3 values ($CNT)"
done

This was the result:
Retrieving data from the masters
qa.m1.continuent.com
+----+---+------+
| t | c | s |
+----+---+------+
| t1 | 2 | 3 |
| t2 | 1 | 1 |
| t3 | 2 | 3 |
+----+---+------+

qa.m2.continuent.com
+----+---+------+
| t | c | s |
+----+---+------+
| t1 | 2 | 3 |
| t2 | 2 | 3 |
| t3 | 2 | 3 |
+----+---+------+

qa.m3.continuent.com
+----+---+------+
| t | c | s |
+----+---+------+
| t1 | 2 | 3 |
| t2 | 1 | 1 |
| t3 | 2 | 3 |
+----+---+------+

As you can see, one record is missing for table t2 in both HOST1 and HOST3.

Looking at the logs, I could determine that the command "insert into test.t2 
values (2)' was missing from the THL for services alpha and charlie.
Please see the attached logs, trepctl and thl output, configuration files, and 
some more observation files for more detail.

 All     Comments    Work Log    Change History           Sort Order: [Ascending order - Click to sort in descending order]
[ Permlink ]
Comment by Giuseppe Maxia [24/Mar/11 06:18 PM]
logs, and other monitoring data for TUC-302

[ Permlink ]
Comment by Giuseppe Maxia [25/Mar/11 03:47 AM]
The failure could be related to block commit.
Changing the loop this way works:

for CNT in $(seq 1 $MAXRECS)
do
    $MYSQL -h $HOST1 -e "insert into test.t1 values ($CNT)"
    sleep 0.1
    $MYSQL -h $HOST2 -e "insert into test.t2 values ($CNT)"
    sleep 0.1
    $MYSQL -h $HOST3 -e "insert into test.t3 values ($CNT)"
    sleep 0.1
done

[ Permlink ]
Comment by Giuseppe Maxia [25/Mar/11 05:39 AM]
It seems definitely related to block commit
Changing the following lines in static-SERVICENAME.properties, the bug does not 
show up.
replicator.stage.d-pq-to-dbms.blockCommitRowCount=1
replicator.stage.q-to-dbms.blockCommitRowCount=1

[ Permlink ]
Comment by Robert Hodges [29/Mar/11 12:37 AM]
I have confirmed the block commit problem in other tests. It appears that it is 
enough to set the block commit values to 1 on the remote services only.

The problem may be related to auto-commit transactions, which seem to mess up 
our demarcation of the originating service of particular transactions. 

Original issue reported on code.google.com by berkeley...@gmail.com on 14 Apr 2011 at 9:18

GoogleCodeExporter commented 9 years ago
Migrated from http://forge.continuent.org/jira/browse/TUC-302. 

Original comment by berkeley...@gmail.com on 14 Apr 2011 at 9:18

GoogleCodeExporter commented 9 years ago
I have just checked in a preliminary fix.  The problem is that Tungsten event 
filtering loses track of the service affiliation when transactions pass through 
2 or more remote services.  In this case we accumulate multiple updates to 
different tungsten_svc databases, which case was not handled correctly. 

Original comment by berkeley...@gmail.com on 19 Apr 2011 at 6:53

GoogleCodeExporter commented 9 years ago

Original comment by berkeley...@gmail.com on 20 Apr 2011 at 5:51

GoogleCodeExporter commented 9 years ago

Original comment by berkeley...@gmail.com on 20 Apr 2011 at 11:51