osalvador / ReplicaDB

ReplicaDB is open source tool for database replication, designed for efficiently transferring bulk data between relational and non-relational databases
https://osalvador.github.io/ReplicaDB/
Apache License 2.0
391 stars 95 forks source link

exception loading into sqlserver #89

Closed zhao-lih closed 2 years ago

zhao-lih commented 2 years ago

2022-08-18 00:21:16,246 INFO ReplicaDB:63 Running ReplicaDB version: 0.12.1 2022-08-18 00:21:16,251 INFO ReplicaDB:67 Setting verbose mode 2022-08-18 00:21:16,251 DEBUG ReplicaDB:68 ToolOptions{ sourceConnect='file://home/zhaolih/mcs/mcs_uat.csv', sourceUser='null', sourcePassword='null', sourceTable='null', sourceColumns='dwelling_id, dwelling_type, civic_no, civic_suffix, unit_id, road_name, road_name, road_type, road_type_key, road_dir, road_dir_key, municipality_name, municipality_name_key, postal_code, prov_abvn, prov_code, csd_uid, csd_municipality_name, block_no, cu_id, is_civic_address', sourceWhere='null', sourceQuery='null', sinkConnect='jdbc:sqlserver://itars-as-dev01:1433;database=MCS;encrypt=false', sinkUser='mcs', sinkPassword='****', sinkTable='GEOSEARCH.VW_ADDRESS_DATA', sinkStagingTable='null', sinkStagingSchema='null', sinkStagingTableAlias='null', sinkColumns='dwelling_id, dwelling_type, civic_no, civic_suffix, unit_id , road_name, road_name_key, road_type, road_type_key, road_dir, road_dir_key, municipality_name, municipality_name_key, postal_code , prov_abvn, prov_code, csd_uid, csd_municipality_name, block_no, cu_id, is_civic_address', sinkDisableEscape=false, sinkDisableIndex=false, sinkDisableTruncate=false, sinkAnalyze=false, jobs=1, BandwidthThrottling=0, quotedIdentifiers=false, fetchSize=100, help=false, version=false, verbose=true, optionsFile='csv_sqlserver.conf', mode='complete', sentryDsn='null', sourceConnectionParams={columns.types=NTEGER, SMALLINT, INTEGER, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, INTEGER, VARCHAR, VARCHAR, INTEGER, TINYINT, format.delimiter=:, format=DEFAULT}, sinkConnectionParams={domain=STC}, sourceFileFormat='null', sinkFileformat='null'} 2022-08-18 00:21:16,337 WARN FileManagerFactory:39 The file format is not defined, setting CSV as the default file format. 2022-08-18 00:21:16,428 INFO SqlManager:370 Truncating sink table with this command: TRUNCATE TABLE GEOSEARCH.VW_ADDRESS_DATA 2022-08-18 00:21:17,184 INFO ReplicaTask:36 Starting TaskId-0 2022-08-18 00:21:17,184 WARN FileManagerFactory:39 The file format is not defined, setting CSV as the default file format. 2022-08-18 00:21:17,230 DEBUG CsvFileManager:99 Setting the initial format to DEFAULT 2022-08-18 00:21:17,230 INFO CsvFileManager:198 The final CSVFormat is: Delimiter=<:> QuoteChar=<"> RecordSeparator=<

EmptyLines:ignored SkipHeaderRecord:false 2022-08-18 00:21:17,287 ERROR ReplicaTask:74 ERROR in TaskId-0 inserting data to sink table: The argument sourceColumn is not valid. 2022-08-18 00:21:17,288 ERROR ReplicaDB:136 Got exception running ReplicaDB: java.util.concurrent.ExecutionException: com.microsoft.sqlserver.jdbc.SQLServerException: The argument sourceColumn is not valid. at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?] at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?] at org.replicadb.ReplicaDB.processReplica(ReplicaDB.java:117) [ReplicaDB-0.12.1.jar:0.12.1] at org.replicadb.ReplicaDB.main(ReplicaDB.java:47) [ReplicaDB-0.12.1.jar:0.12.1] Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The argument sourceColumn is not valid. at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:237) ~[mssql-jdbc-11.2.0.jre11.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.throwInvalidArgument(SQLServerBulkCopy.java:1638) ~[mssql-jdbc-11.2.0.jre11.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.addColumnMapping(SQLServerBulkCopy.java:429) ~[mssql-jdbc-11.2.0.jre11.jar:?] at org.replicadb.manager.SQLServerManager.insertDataToTable(SQLServerManager.java:83) ~[ReplicaDB-0.12.1.jar:0.12.1] at org.replicadb.ReplicaTask.call(ReplicaTask.java:70) ~[ReplicaDB-0.12.1.jar:0.12.1] at org.replicadb.ReplicaTask.call(ReplicaTask.java:14) ~[ReplicaDB-0.12.1.jar:0.12.1] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at java.lang.Thread.run(Thread.java:829) ~[?:?] csv_sqlserver.conf.zip

zhao-lih commented 2 years ago

BTW, the Sqlserver is Linux based, not sure if that makes any difference.

osalvador commented 2 years ago

Hi @zhao-lih,

There is a new release https://github.com/osalvador/ReplicaDB/releases/tag/v0.12.2 fixing this bug.

Thanks for report!

zhao-lih commented 2 years ago

Thanks for the quick fix @osalvador !

A closer look shows there might be something wrong in the conf file, will do more testing..

zhao-lih commented 2 years ago

2022-08-19 15:26:43,816 INFO ReplicaDB:63 Running ReplicaDB version: 0.12.2 2022-08-19 15:26:43,820 INFO ReplicaDB:67 Setting verbose mode 2022-08-19 15:26:43,821 DEBUG ReplicaDB:68 ToolOptions{ sourceConnect='file://home/xx/mcs/mcs2.csv', sourceUser='null', sourcePassword='null', sourceTable='null', sourceColumns='dwelling_id, dwelling_type, civic_no, civic_suffix, unit_id, road_name, road_name, road_type, road_type_key, road_dir, road_dir_key, municipality_name, municipality_name_key, postal_code, prov_abvn, prov_code, csd_uid, csd_municipality_name, block_no, cu_id, is_civic_address', sourceWhere='null', sourceQuery='null', sinkConnect='jdbc:sqlserver://xx:1433;database=MCS;encrypt=false', sinkUser='mcs', sinkPassword='****', sinkTable='GEOSEARCH.VW_ADDRESS_DATA', sinkStagingTable='null', sinkStagingSchema='null', sinkStagingTableAlias='null', sinkColumns='dwelling_id, dwelling_type, civic_no, civic_suffix, unit_id , road_name, road_name_key, road_type, road_type_key, road_dir, road_dir_key, municipality_name, municipality_name_key, postal_code , prov_abvn, prov_code, csd_uid, csd_municipality_name, block_no, cu_id, is_civic_address', sinkDisableEscape=false, sinkDisableIndex=false, sinkDisableTruncate=false, sinkAnalyze=false, jobs=1, BandwidthThrottling=0, quotedIdentifiers=false, fetchSize=100, help=false, version=false, verbose=true, optionsFile='csv_sqlserver.conf', mode='complete', sentryDsn='null', sourceConnectionParams={columns.types=INTEGER, SMALLINT, INTEGER, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, INTEGER, VARCHAR, VARCHAR, INTEGER, TINYINT, format.delimiter=:, format=DEFAULT}, sinkConnectionParams={domain=STC}, sourceFileFormat='null', sinkFileformat='null'} 2022-08-19 15:26:43,906 WARN FileManagerFactory:39 The file format is not defined, setting CSV as the default file format. 2022-08-19 15:26:43,909 INFO SqlManager:370 Truncating sink table with this command: TRUNCATE TABLE GEOSEARCH.VW_ADDRESS_DATA 2022-08-19 15:26:44,382 INFO ReplicaTask:35 Starting TaskId-0 2022-08-19 15:26:44,383 WARN FileManagerFactory:39 The file format is not defined, setting CSV as the default file format. 2022-08-19 15:26:44,439 DEBUG CsvFileManager:99 Setting the initial format to DEFAULT 2022-08-19 15:26:44,439 INFO CsvFileManager:198 The final CSVFormat is: Delimiter=<:> QuoteChar=<"> RecordSeparator=<

EmptyLines:ignored SkipHeaderRecord:false 2022-08-19 15:26:44,504 INFO SQLServerManager:94 Performing BulkCopy into GEOSEARCH.VW_ADDRESS_DATA 2022-08-19 15:26:44,559 ERROR CsvCachedRowSetImpl:250 An error has occurred reading line number 1 of the CSV file java.lang.NumberFormatException: For input string: "true" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:?] at java.lang.Integer.parseInt(Integer.java:652) ~[?:?] at java.lang.Byte.parseByte(Byte.java:152) ~[?:?] at java.lang.Byte.parseByte(Byte.java:178) ~[?:?] at org.replicadb.rowset.CsvCachedRowSetImpl.readData(CsvCachedRowSetImpl.java:182) [ReplicaDB-0.12.2.jar:0.12.2] at org.replicadb.rowset.CsvCachedRowSetImpl.next(CsvCachedRowSetImpl.java:142) [ReplicaDB-0.12.2.jar:0.12.2] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.goToNextRow(SQLServerBulkCopy.java:3410) [mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeBatchData(SQLServerBulkCopy.java:3435) [mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.doInsertBulk(SQLServerBulkCopy.java:1494) [mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.access$300(SQLServerBulkCopy.java:65) [mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy$1InsertBulk.doExecute(SQLServerBulkCopy.java:649) [mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7194) [mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2935) [mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.sendBulkLoadBCP(SQLServerBulkCopy.java:683) [mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeToServer(SQLServerBulkCopy.java:1581) [mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeResultSet(SQLServerBulkCopy.java:582) [mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeToServer(SQLServerBulkCopy.java:533) [mssql-jdbc-7.2.2.jre8.jar:?] at org.replicadb.manager.SQLServerManager.insertDataToTable(SQLServerManager.java:98) [ReplicaDB-0.12.2.jar:0.12.2] at org.replicadb.ReplicaTask.call(ReplicaTask.java:65) [ReplicaDB-0.12.2.jar:0.12.2] at org.replicadb.ReplicaTask.call(ReplicaTask.java:15) [ReplicaDB-0.12.2.jar:0.12.2] at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:829) [?:?] 2022-08-19 15:26:44,567 ERROR ReplicaTask:69 ERROR in TaskId-0 inserting data to sink table: [For input string: "true"] 2022-08-19 15:26:44,567 ERROR ReplicaDB:136 Got exception running ReplicaDB: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "true" at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?] at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?] at org.replicadb.ReplicaDB.processReplica(ReplicaDB.java:117) [ReplicaDB-0.12.2.jar:0.12.2] at org.replicadb.ReplicaDB.main(ReplicaDB.java:47) [ReplicaDB-0.12.2.jar:0.12.2] Caused by: java.lang.NumberFormatException: For input string: "true" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:?] at java.lang.Integer.parseInt(Integer.java:652) ~[?:?] at java.lang.Byte.parseByte(Byte.java:152) ~[?:?] at java.lang.Byte.parseByte(Byte.java:178) ~[?:?] at org.replicadb.rowset.CsvCachedRowSetImpl.readData(CsvCachedRowSetImpl.java:182) ~[ReplicaDB-0.12.2.jar:0.12.2] at org.replicadb.rowset.CsvCachedRowSetImpl.next(CsvCachedRowSetImpl.java:142) ~[ReplicaDB-0.12.2.jar:0.12.2] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.goToNextRow(SQLServerBulkCopy.java:3410) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeBatchData(SQLServerBulkCopy.java:3435) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.doInsertBulk(SQLServerBulkCopy.java:1494) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.access$300(SQLServerBulkCopy.java:65) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy$1InsertBulk.doExecute(SQLServerBulkCopy.java:649) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7194) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2935) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.sendBulkLoadBCP(SQLServerBulkCopy.java:683) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeToServer(SQLServerBulkCopy.java:1581) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeResultSet(SQLServerBulkCopy.java:582) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeToServer(SQLServerBulkCopy.java:533) ~[mssql-jdbc-7.2.2.jre8.jar:?] at org.replicadb.manager.SQLServerManager.insertDataToTable(SQLServerManager.java:98) ~[ReplicaDB-0.12.2.jar:0.12.2] at org.replicadb.ReplicaTask.call(ReplicaTask.java:65) ~[ReplicaDB-0.12.2.jar:0.12.2] at org.replicadb.ReplicaTask.call(ReplicaTask.java:15) ~[ReplicaDB-0.12.2.jar:0.12.2] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at java.lang.Thread.run(Thread.java:829) ~[?:?] 2022-08-19 15:26:44,569 INFO ReplicaDB:54 Total process time: 762ms

zhao-lih commented 2 years ago

seems "TINYINT" type is translate to "true"/"false" when exported to csv, then it can't be translated back..

zhao-lih commented 2 years ago

after changing true/false to 1/0 manually, I got this exception:

2022-08-19 16:15:51,116 INFO SQLServerManager:94 Performing BulkCopy into GEOSEARCH.VW_ADDRESS_DATA 2022-08-19 16:15:57,880 ERROR ReplicaTask:69 ERROR in TaskId-0 inserting data to sink table: [Received an invalid column length from the bcp client for colid 4.] 2022-08-19 16:15:57,881 ERROR ReplicaDB:136 Got exception running ReplicaDB: java.util.concurrent.ExecutionException: com.microsoft.sqlserver.jdbc.SQLServerException: Received an invalid column length from the bcp client for colid 4. at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?] at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?] at org.replicadb.ReplicaDB.processReplica(ReplicaDB.java:117) [ReplicaDB-0.12.2.jar:0.12.2] at org.replicadb.ReplicaDB.main(ReplicaDB.java:47) [ReplicaDB-0.12.2.jar:0.12.2] Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Received an invalid column length from the bcp client for colid 4. at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onEOF(tdsparser.java:258) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:104) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:26) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.doInsertBulk(SQLServerBulkCopy.java:1517) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.access$300(SQLServerBulkCopy.java:65) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy$1InsertBulk.doExecute(SQLServerBulkCopy.java:649) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7194) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2935) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.sendBulkLoadBCP(SQLServerBulkCopy.java:683) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeToServer(SQLServerBulkCopy.java:1581) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeResultSet(SQLServerBulkCopy.java:582) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeToServer(SQLServerBulkCopy.java:533) ~[mssql-jdbc-7.2.2.jre8.jar:?] at org.replicadb.manager.SQLServerManager.insertDataToTable(SQLServerManager.java:98) ~[ReplicaDB-0.12.2.jar:0.12.2] at org.replicadb.ReplicaTask.call(ReplicaTask.java:65) ~[ReplicaDB-0.12.2.jar:0.12.2] at org.replicadb.ReplicaTask.call(ReplicaTask.java:15) ~[ReplicaDB-0.12.2.jar:0.12.2] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at java.lang.Thread.run(Thread.java:829) ~[?:?] 2022-08-19 16:15:57,889 INFO ReplicaDB:54 Total process time: 7574ms

zhao-lih commented 2 years ago

I used a piece of awk code to calculate the max length of each column, compare the result with the database table DDL. Seems all columns are within range. And the cvs was exported from the same source table..

A quick search leads me to https://stackoverflow.com/questions/10442686/received-an-invalid-column-length-from-the-bcp-client-for-colid-6, not sure if the debug code can be easily implemented in java though.

osalvador commented 2 years ago

Hi @zhao-lih ,

I was testing with TINYINT data type (only 0-255 values) from CSV to TINYINT column in SQL Server and it works correctly.

Could you please share some lines from your source CSV file and the DDL of your sink table for testing.

Thanks.

zhao-lih commented 2 years ago

@osalvador , there's no problem with "from CSV to TINYINT column in SQL Server", the issue is, "from sqlserver to csv", the column is translated into a Boolean value, "true/false", instead of a 0-255 value.

osalvador commented 2 years ago

Hi @zhao-lih,

Are you using ReplicaDB to generate a CSV file from SQL Server? I just generated a CSV file from a table with the TINYINT column in SQL Server and the generated value is correct.

zhao-lih commented 2 years ago

@osalvador , I did. Actually with both versions 11.8 and 12.2, not sure if it's something related to my environment.. For me it's not a blocking error, I can just fix it manually.

Not sure what we can do on the 2nd exception I posted above, as it doesn't tell which exact column is the offending one. I had similar issue with the bcp command on command line, until I used a format file, for both export and import.