is00hcw / tungsten-replicator

Automatically exported from code.google.com/p/tungsten-replicator
0 stars 1 forks source link

MySQL-Vertica: dropcolumn filter causes Invalid write to CSV file #976

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1. Create a test db.table on MySQL
CREATE DATABASE drop_test;
CREATE TABLE drop_test.table1 (id int PRIMARY KEY AUTO_INCREMENT, col1 int, 
col2 varchar(32));

2. Create the DDL and run it on Vertica

cd ~/install/software
tungsten-replicator/bin/ddlscan -user tungsten -url 
'jdbc:mysql://localhost:3306' -pass tungsten -template ddl-mysql-vertica.vm -db 
drop_test > ddl.sql
tungsten-replicator/bin/ddlscan -user tungsten -url 
'jdbc:mysql://localhost:3306' -pass tungsten -template 
ddl-mysql-vertica-staging.vm -db drop_test >> ddl.sql
scp ddl.sql vertica-host:
# run ddl.sql on vertica-host using vsql

3. Configure and Install/Start Tungsten

 ./tools/tpm configure alpha \
    --master=mysql-host.domain \
    --members=mysql-host.domain,vertica-host.domain \
    --install-directory=/home/tungsten/install \
    --svc-extractor-filters=replicate,pkey,colnames,dropcolumn,enumtostring,settostring \
    --svc-applier-filters=zerodate2null \
    --property=replicator.filter.pkey.addColumnsToDeletes=true \
    --property=replicator.filter.pkey.addPkeyToInserts=true \
    --property=replicator.filter.replicate.do=drop_test \
    --property=replicator.filter.dropcolumn.definitionsFile=/home/tungsten/install/software/dropcolumn.json \
    --java-file-encoding=UTF8 \
    --disable-relay-logs=true \
    --skip-validation-check=HostsFileCheck \
    --enable-heterogenous-service=true \
    --start

echo '[{"schema":"drop_test","table":"table1","columns":["col1"]}]' > 
dropcolumn.json # drop table1.col1
scp dropcolumn.json vertica-host:install/software/

./tools/tpm configure alpha \
    --hosts=mysql-host.domain \
    --replication-user=tungsten \
    --replication-password=tungsten \
    --enable-heterogenous-master=true

./tools/tpm configure alpha \
    --hosts=vertica-host.domain \
    --replication-user=tungsten \
    --replication-password=tungsten \
    --enable-heterogenous-slave=true \
    --batch-enabled=true \
    --batch-load-template=vertica6 \
    --datasource-type=vertica \
    --vertica-dbname=VerticaDB \
    --replication-host=vertica-host.domain \
    --replication-port=5433 \
    --skip-validation-check=InstallerMasterSlaveCheck \
    --svc-applier-block-commit-size=25000 \
    --svc-applier-block-commit-interval=3s

./tools/tpm install

4. Insert a row into test table
MySQL: INSERT INTO table1 (col1,col2) VALUES (42,'test');

What is the expected output?

The row should be replicated into Vertica

What do you see instead?

Stage task failed: stage=q-to-dbms seqno=1 fragno=0
Invalid write to CSV file: name=/data/tungsten/staging0/drop_test.table1.csv 
table=Table name=drop_test.table1 (Column name=id, Column name=col2) 
table_columns=id,col2 
csv_columns=tungsten_opcode,tungsten_seqno,tungsten_row_id,id,col2

What version of the product are you using?

Tungsten Replicator 2.2.1 build 403

On what operating system?

Linux CentOS 6.5
java version "1.7.0_55"
OpenJDK Runtime Environment (rhel-2.4.7.1.el6_5-x86_64 u55-b13)
OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)

Notes:

/home/tungsten/install/tungsten/tungsten-replicator/bin/trepctl status
Processing status command...
NAME                     VALUE
----                     -----
appliedLastEventId     : NONE
appliedLastSeqno       : -1
appliedLatency         : -1.0
autoRecoveryEnabled    : false
autoRecoveryTotal      : 0
channels               : -1
clusterName            : alpha
currentEventId         : NONE
currentTimeMillis      : 1406014706623
dataServerHost         : vertica-host.domain
extensions             :
host                   : vertica-host.domain
latestEpochNumber      : -1
masterConnectUri       : thl://mysql-host.domain:2112/
masterListenUri        : null
maximumStoredSeqNo     : -1
minimumStoredSeqNo     : -1
offlineRequests        : NONE
pendingError           : Stage task failed: stage=q-to-dbms seqno=1 fragno=0
pendingErrorCode       : NONE
pendingErrorEventId    : mysql-bin.000004:0000000000000573;-1
pendingErrorSeqno      : 1
pendingExceptionMessage: Invalid write to CSV file: 
name=/data/tungsten/staging0/drop_test.table1.csv table=Table 
name=drop_test.table1 (Column name=id, Column name=col2) table_columns=id,col2 
csv_columns=tungsten_opcode,tungsten_seqno,tungsten_row_id,id,col2
pipelineSource         : UNKNOWN
relativeLatency        : -1.0
resourcePrecedence     : 99
rmiPort                : 10000
role                   : slave
seqnoType              : java.lang.Long
serviceName            : alpha
serviceType            : unknown
simpleServiceName      : alpha
siteName               : default
sourceId               : vertica-host.domain
state                  : OFFLINE:ERROR
timeInStateSeconds     : 53.804
transitioningTo        :
uptimeSeconds          : 113.748
useSSLConnection       : false
version                : Tungsten Replicator 2.2.1 build 403
Finished status command...

Original issue reported on code.google.com by disqkk on 22 Jul 2014 at 7:50

Attachments:

GoogleCodeExporter commented 9 years ago
Superb report. Accepting and putting it on the radar.

Original comment by linas.vi...@continuent.com on 22 Jul 2014 at 12:19

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Oh, our datadir (/data/tungsten) is not symlinked. I simply edited 
/software/tungsten-replicator/samples/conf/appliers/batch.tpl and set 
replicator.applier.dbms.stageDirectory=/data/tungsten

Sorry for missing this on the initial bug report.

Original comment by disqkk on 23 Jul 2014 at 7:29

GoogleCodeExporter commented 9 years ago
Sorry, the previous commit message was erroneous - it was meant for issue 975.

Original comment by eric.har...@continuent.com on 23 Jul 2014 at 7:57

GoogleCodeExporter commented 9 years ago
disqkk,

I just found a bug in Vertica's ddlscan template and added a comment:
https://code.google.com/p/tungsten-replicator/issues/detail?id=895#c10

On other words, try adding tungsten_commit_timestamp field to the staging table 
right after tungsten_row_id.

Original comment by linas.vi...@continuent.com on 28 Jul 2014 at 11:57

GoogleCodeExporter commented 9 years ago
Added. I've attached the latest log.

I even tried dropping the dropped column from the Vertica schema by manually 
editing ddl.sql before creation, didn't help.

INFO   | jvm 1    | 2014/08/15 10:51:07 | 
com.continuent.tungsten.replicator.ReplicatorException: Invalid write to CSV 
file: name=/data/tungsten/staging0/drop_test.table1.csv table=Table 
name=drop_test.table1 (Column name=id, Column name=col2) table_columns=id,col2 
csv_columns=tungsten_opcode,tungsten_seqno,tungsten_row_id,id,col2
INFO   | jvm 1    | 2014/08/15 10:51:07 |   at 
com.continuent.tungsten.replicator.applier.batch.SimpleBatchApplier.writeValues(
SimpleBatchApplier.java:981)
INFO   | jvm 1    | 2014/08/15 10:51:07 |   at 
com.continuent.tungsten.replicator.applier.batch.SimpleBatchApplier.apply(Simple
BatchApplier.java:337)
INFO   | jvm 1    | 2014/08/15 10:51:07 |   at 
com.continuent.tungsten.replicator.applier.ApplierWrapper.apply(ApplierWrapper.j
ava:101)
INFO   | jvm 1    | 2014/08/15 10:51:07 |   at 
com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.apply(SingleTh
readStageTask.java:768)
INFO   | jvm 1    | 2014/08/15 10:51:07 |   at 
com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.runTask(Single
ThreadStageTask.java:501)
INFO   | jvm 1    | 2014/08/15 10:51:07 |   at 
com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.run(SingleThre
adStageTask.java:176)
INFO   | jvm 1    | 2014/08/15 10:51:07 |   at 
java.lang.Thread.run(Thread.java:744)
INFO   | jvm 1    | 2014/08/15 10:51:07 | Caused by: 
com.continuent.tungsten.common.csv.CsvException: Attempt to write to invalid 
column index: index=6 value=test2 row size=5
INFO   | jvm 1    | 2014/08/15 10:51:07 |   at 
com.continuent.tungsten.common.csv.CsvWriter.put(CsvWriter.java:421)
INFO   | jvm 1    | 2014/08/15 10:51:07 |   at 
com.continuent.tungsten.replicator.applier.batch.SimpleBatchApplier.writeValues(
SimpleBatchApplier.java:956)
INFO   | jvm 1    | 2014/08/15 10:51:07 |   ... 6 more

Original comment by disqkk on 15 Aug 2014 at 11:00

Attachments:

GoogleCodeExporter commented 9 years ago
The issue is that the javascript filter drops the column data, but the column 
spec still contains the original column index which is used as the csv column 
position.  Attached is a patch that uses a simple incrementing column index 
instead of the now erroneous column spec. 

Original comment by hellosi...@gmail.com on 4 Nov 2014 at 1:59

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by linas.vi...@continuent.com on 19 Dec 2014 at 7:03

GoogleCodeExporter commented 9 years ago

Original comment by linas.vi...@continuent.com on 19 Jan 2015 at 2:18