is00hcw / tungsten-replicator

Automatically exported from code.google.com/p/tungsten-replicator
0 stars 1 forks source link

Replication from MySQL into Verica 7 (delete fails) with tungsten-replicator-3.0.0-524 #1108

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1. Create database 'myschema' into MySQL or MariaDB (doesn't matter):
CREATE TABLE country
(
 id CHAR(2),
  name VARCHAR(64) NOT NULL,
  capital_city_id INT,
  PRIMARY KEY(id)
) ENGINE = INNODB;

2. Add following table definition of regular & stage table into Vertica 7:
CREATE TABLE myschema.country
(
 id    CHAR(2) NOT NULL PRIMARY KEY,
 name  VARCHAR(64) NOT NULL
)
ORDER BY id;

CREATE TABLE myschema.stage_xxx_country
(
 tungsten_opcode CHAR(1) ,
 tungsten_seqno INT ,
 tungsten_row_id INT ,
 tungsten_commit_timestamp TIMESTAMP ,
 id CHAR(2) ,
 name VARCHAR(64)
) ORDER BY tungsten_seqno, tungsten_row_id;

3. Use dropcolumn.js filter with following JSON:
[
  {
    "schema": "bongacash",
    "table": "country",
    "columns": [
      "capital_city_id"
    ]
  }
]

4. Run master and slave with following configuration:

Master (extractor):

./tools/tpm configure myschema \
    --install-directory=/opt/continuent \
    --topology=master-slave \
    --enable-heterogenous-service=true \
    --disable-relay-logs=true \
    --master=db3mysql \
    --members=db3mysql,vertica \
    --replication-user=tungsten \
    --replication-password=XXXXXXX \
    --mysql-enable-enumtostring=true \
    --mysql-enable-settostring=true \
    --mysql-use-bytes-for-string=false \
    --svc-extractor-filters=replicate,enumtostring,settostring,pkey,colnames,dropcolumn \
    --property=replicator.filter.replicate.do=myschema.country \
    --property=replicator.filter.pkey.addColumnsToDeletes=true \
    --property=replicator.filter.pkey.addPkeyToInserts=true \
    --property=replicator.filter.dropcolumn.definitionsFile=/opt/continuent/share/dropcolumn.json \
    --java-file-encoding=UTF8 \
    --start

Slave (applier):
./tools/tpm configure myschema \
    --master-thl-host=db3mysql \
    --install-directory=/opt/continuent/myschema/applier \
    --topology=master-slave \
    --enable-heterogenous-service=true \
    --datasource-type=vertica \
    --master=db3mysql \
    --members=db3mysql,vertica \
    --hosts=vertica \
    --replication-user=vertica \
    --replication-password=XXXXXXX \
    --batch-enabled=true \
    --batch-load-language=js \
    --batch-load-template=vertica6 \
    --vertica-dbname=myschema \
    --replication-host=vertica \
    --replication-port=5433 \
    --property=replicator.applier.dbms=com.continuent.tungsten.replicator.applier.batch.SimpleBatchApplier \
    --skip-validation-check=InstallerMasterSlaveCheck \
    --svc-applier-block-commit-size=25000 \
    --svc-applier-block-commit-interval=3s

5. Fill MySQL table:
INSERT INTO `country` VALUES('AD', 'Andorra', 6579);
INSERT INTO `country` VALUES('AE', 'United Arab Emirates', 610);
INSERT INTO `country` VALUES('AF', 'Afghanistan', 102889);
INSERT INTO `country` VALUES('AG', 'Antigua and Barbuda', 191177);
INSERT INTO `country` VALUES('AI', 'Anguilla', NULL);
INSERT INTO `country` VALUES('AL', 'Albania', 224153);
INSERT INTO `country` VALUES('AM', 'Armenia', 249432);

6. Delete row from MySQL:
DELETE FROM `country` WHERE id = 'AD';

What is the expected output?

Expected row wit id = 'AD' to be deleted from Vertica.

What do you see instead?

LOAD DATA ROW count does not match: sql=COPY myschema.stage_xxx_country FROM 
'/tmp/staging/myschema/staging0/myschema-country-33643.csv' DIRECT NULL 'null' 
DELIMITER ',' ENCLOSED BY '"' expected_copy_rows=1

And content of '/tmp/staging/myschema/staging0/myschema-country-33643.csv':

<<<CSV
"D","4","1","2015-03-11 13:53:22.000","AC",null,null
>>>CSV

What version of the product are you using?

3.0.0-524

On what operating system?

CentOS 6

Please provide any additional information below.

When I'am updating any record CSV contains following correct DELETE data:
<<<CSV
"D","2","1","2015-03-11 13:44:32.000","AD",null
"I","2","2","2015-03-11 13:44:32.000","AD","AD UPDATE"
>>>CSV

Notes:

I also did some investigation and suppose that error may come not from my 
configuration but from 
(src/java/com/continuent/tungsten/replicator/applier/batch/SimpleBatchApplier.ja
va):
// Fetch column names and values.
List<ColumnSpec> keySpecs = orc.getKeySpec();
List<ColumnSpec> colSpecs = orc.getColumnSpec();
ArrayList<ArrayList<ColumnVal>> keyValues = orc
        .getKeyValues();

// If the colspecs are empty, use keyspecs. This
// gets around an upstream bug in column metadata
// generation for tables without primary keys (Issue
// 916).
if (colSpecs.size() == 0)
    colSpecs = keySpecs;

// Get information about the table definition.
Table tableMetadata = this.getTableMetadata(schema,
        table, colSpecs, keySpecs);

// Insert each column into the CSV file.
writeValues(seqno, commitTimestamp, service,
        tableMetadata, keySpecs, keyValues, DELETE);

Here you see the code that colSpec = keySpecs and that is why it produces CSV 
with all fields, maybe I need to configure applier somehow  and let him know 
what I don't need extra (capital_city_id) column? From the otherside I see that 
SimpleBatchApplier can produce correct CSV for DELETE statement (ex: 
myschema-country-2.csv)

Original issue reported on code.google.com by vait...@gmail.com on 13 Mar 2015 at 3:21

GoogleCodeExporter commented 9 years ago
Here is google group discussion: 
https://groups.google.com/forum/#!topic/tungsten-replicator-discuss/3rqw7sCs9mY

Original comment by vait...@gmail.com on 13 Mar 2015 at 3:22

GoogleCodeExporter commented 9 years ago
Hi guys, 

there is mistake above, configuartaion string:
    --svc-applier-block-commit-size=25000 \
    --svc-applier-block-commit-interval=3s

to reproduce issue you need to setup this way:
    --svc-applier-block-commit-size=10000 \
    --svc-applier-block-commit-interval=60s

Internal cache for "table metadata" is 5 seconds, so this way possibel when 
dealing with "getCsvFileSet" method to get from cache "delete" metadata instead 
of "insert" or "update" metadata.

Can you tell me if I'm wrong or not?

Original comment by vait...@gmail.com on 18 May 2015 at 11:10