Closed zhao-lih closed 2 years ago
BTW, the Sqlserver is Linux based, not sure if that makes any difference.
Hi @zhao-lih,
There is a new release https://github.com/osalvador/ReplicaDB/releases/tag/v0.12.2 fixing this bug.
Thanks for report!
Thanks for the quick fix @osalvador !
A closer look shows there might be something wrong in the conf file, will do more testing..
2022-08-19 15:26:43,816 INFO ReplicaDB:63 Running ReplicaDB version: 0.12.2 2022-08-19 15:26:43,820 INFO ReplicaDB:67 Setting verbose mode 2022-08-19 15:26:43,821 DEBUG ReplicaDB:68 ToolOptions{ sourceConnect='file://home/xx/mcs/mcs2.csv', sourceUser='null', sourcePassword='null', sourceTable='null', sourceColumns='dwelling_id, dwelling_type, civic_no, civic_suffix, unit_id, road_name, road_name, road_type, road_type_key, road_dir, road_dir_key, municipality_name, municipality_name_key, postal_code, prov_abvn, prov_code, csd_uid, csd_municipality_name, block_no, cu_id, is_civic_address', sourceWhere='null', sourceQuery='null', sinkConnect='jdbc:sqlserver://xx:1433;database=MCS;encrypt=false', sinkUser='mcs', sinkPassword='****', sinkTable='GEOSEARCH.VW_ADDRESS_DATA', sinkStagingTable='null', sinkStagingSchema='null', sinkStagingTableAlias='null', sinkColumns='dwelling_id, dwelling_type, civic_no, civic_suffix, unit_id , road_name, road_name_key, road_type, road_type_key, road_dir, road_dir_key, municipality_name, municipality_name_key, postal_code , prov_abvn, prov_code, csd_uid, csd_municipality_name, block_no, cu_id, is_civic_address', sinkDisableEscape=false, sinkDisableIndex=false, sinkDisableTruncate=false, sinkAnalyze=false, jobs=1, BandwidthThrottling=0, quotedIdentifiers=false, fetchSize=100, help=false, version=false, verbose=true, optionsFile='csv_sqlserver.conf', mode='complete', sentryDsn='null', sourceConnectionParams={columns.types=INTEGER, SMALLINT, INTEGER, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, INTEGER, VARCHAR, VARCHAR, INTEGER, TINYINT, format.delimiter=:, format=DEFAULT}, sinkConnectionParams={domain=STC}, sourceFileFormat='null', sinkFileformat='null'} 2022-08-19 15:26:43,906 WARN FileManagerFactory:39 The file format is not defined, setting CSV as the default file format. 2022-08-19 15:26:43,909 INFO SqlManager:370 Truncating sink table with this command: TRUNCATE TABLE GEOSEARCH.VW_ADDRESS_DATA 2022-08-19 15:26:44,382 INFO ReplicaTask:35 Starting TaskId-0 2022-08-19 15:26:44,383 WARN FileManagerFactory:39 The file format is not defined, setting CSV as the default file format. 2022-08-19 15:26:44,439 DEBUG CsvFileManager:99 Setting the initial format to DEFAULT 2022-08-19 15:26:44,439 INFO CsvFileManager:198 The final CSVFormat is: Delimiter=<:> QuoteChar=<"> RecordSeparator=<
EmptyLines:ignored SkipHeaderRecord:false 2022-08-19 15:26:44,504 INFO SQLServerManager:94 Performing BulkCopy into GEOSEARCH.VW_ADDRESS_DATA 2022-08-19 15:26:44,559 ERROR CsvCachedRowSetImpl:250 An error has occurred reading line number 1 of the CSV file java.lang.NumberFormatException: For input string: "true" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:?] at java.lang.Integer.parseInt(Integer.java:652) ~[?:?] at java.lang.Byte.parseByte(Byte.java:152) ~[?:?] at java.lang.Byte.parseByte(Byte.java:178) ~[?:?] at org.replicadb.rowset.CsvCachedRowSetImpl.readData(CsvCachedRowSetImpl.java:182) [ReplicaDB-0.12.2.jar:0.12.2] at org.replicadb.rowset.CsvCachedRowSetImpl.next(CsvCachedRowSetImpl.java:142) [ReplicaDB-0.12.2.jar:0.12.2] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.goToNextRow(SQLServerBulkCopy.java:3410) [mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeBatchData(SQLServerBulkCopy.java:3435) [mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.doInsertBulk(SQLServerBulkCopy.java:1494) [mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.access$300(SQLServerBulkCopy.java:65) [mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy$1InsertBulk.doExecute(SQLServerBulkCopy.java:649) [mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7194) [mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2935) [mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.sendBulkLoadBCP(SQLServerBulkCopy.java:683) [mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeToServer(SQLServerBulkCopy.java:1581) [mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeResultSet(SQLServerBulkCopy.java:582) [mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeToServer(SQLServerBulkCopy.java:533) [mssql-jdbc-7.2.2.jre8.jar:?] at org.replicadb.manager.SQLServerManager.insertDataToTable(SQLServerManager.java:98) [ReplicaDB-0.12.2.jar:0.12.2] at org.replicadb.ReplicaTask.call(ReplicaTask.java:65) [ReplicaDB-0.12.2.jar:0.12.2] at org.replicadb.ReplicaTask.call(ReplicaTask.java:15) [ReplicaDB-0.12.2.jar:0.12.2] at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:829) [?:?] 2022-08-19 15:26:44,567 ERROR ReplicaTask:69 ERROR in TaskId-0 inserting data to sink table: [For input string: "true"] 2022-08-19 15:26:44,567 ERROR ReplicaDB:136 Got exception running ReplicaDB: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "true" at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?] at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?] at org.replicadb.ReplicaDB.processReplica(ReplicaDB.java:117) [ReplicaDB-0.12.2.jar:0.12.2] at org.replicadb.ReplicaDB.main(ReplicaDB.java:47) [ReplicaDB-0.12.2.jar:0.12.2] Caused by: java.lang.NumberFormatException: For input string: "true" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:?] at java.lang.Integer.parseInt(Integer.java:652) ~[?:?] at java.lang.Byte.parseByte(Byte.java:152) ~[?:?] at java.lang.Byte.parseByte(Byte.java:178) ~[?:?] at org.replicadb.rowset.CsvCachedRowSetImpl.readData(CsvCachedRowSetImpl.java:182) ~[ReplicaDB-0.12.2.jar:0.12.2] at org.replicadb.rowset.CsvCachedRowSetImpl.next(CsvCachedRowSetImpl.java:142) ~[ReplicaDB-0.12.2.jar:0.12.2] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.goToNextRow(SQLServerBulkCopy.java:3410) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeBatchData(SQLServerBulkCopy.java:3435) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.doInsertBulk(SQLServerBulkCopy.java:1494) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.access$300(SQLServerBulkCopy.java:65) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy$1InsertBulk.doExecute(SQLServerBulkCopy.java:649) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7194) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2935) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.sendBulkLoadBCP(SQLServerBulkCopy.java:683) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeToServer(SQLServerBulkCopy.java:1581) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeResultSet(SQLServerBulkCopy.java:582) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeToServer(SQLServerBulkCopy.java:533) ~[mssql-jdbc-7.2.2.jre8.jar:?] at org.replicadb.manager.SQLServerManager.insertDataToTable(SQLServerManager.java:98) ~[ReplicaDB-0.12.2.jar:0.12.2] at org.replicadb.ReplicaTask.call(ReplicaTask.java:65) ~[ReplicaDB-0.12.2.jar:0.12.2] at org.replicadb.ReplicaTask.call(ReplicaTask.java:15) ~[ReplicaDB-0.12.2.jar:0.12.2] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at java.lang.Thread.run(Thread.java:829) ~[?:?] 2022-08-19 15:26:44,569 INFO ReplicaDB:54 Total process time: 762ms
seems "TINYINT" type is translate to "true"/"false" when exported to csv, then it can't be translated back..
after changing true/false to 1/0 manually, I got this exception:
2022-08-19 16:15:51,116 INFO SQLServerManager:94 Performing BulkCopy into GEOSEARCH.VW_ADDRESS_DATA 2022-08-19 16:15:57,880 ERROR ReplicaTask:69 ERROR in TaskId-0 inserting data to sink table: [Received an invalid column length from the bcp client for colid 4.] 2022-08-19 16:15:57,881 ERROR ReplicaDB:136 Got exception running ReplicaDB: java.util.concurrent.ExecutionException: com.microsoft.sqlserver.jdbc.SQLServerException: Received an invalid column length from the bcp client for colid 4. at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?] at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?] at org.replicadb.ReplicaDB.processReplica(ReplicaDB.java:117) [ReplicaDB-0.12.2.jar:0.12.2] at org.replicadb.ReplicaDB.main(ReplicaDB.java:47) [ReplicaDB-0.12.2.jar:0.12.2] Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Received an invalid column length from the bcp client for colid 4. at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onEOF(tdsparser.java:258) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:104) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:26) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.doInsertBulk(SQLServerBulkCopy.java:1517) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.access$300(SQLServerBulkCopy.java:65) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy$1InsertBulk.doExecute(SQLServerBulkCopy.java:649) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7194) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2935) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.sendBulkLoadBCP(SQLServerBulkCopy.java:683) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeToServer(SQLServerBulkCopy.java:1581) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeResultSet(SQLServerBulkCopy.java:582) ~[mssql-jdbc-7.2.2.jre8.jar:?] at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeToServer(SQLServerBulkCopy.java:533) ~[mssql-jdbc-7.2.2.jre8.jar:?] at org.replicadb.manager.SQLServerManager.insertDataToTable(SQLServerManager.java:98) ~[ReplicaDB-0.12.2.jar:0.12.2] at org.replicadb.ReplicaTask.call(ReplicaTask.java:65) ~[ReplicaDB-0.12.2.jar:0.12.2] at org.replicadb.ReplicaTask.call(ReplicaTask.java:15) ~[ReplicaDB-0.12.2.jar:0.12.2] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at java.lang.Thread.run(Thread.java:829) ~[?:?] 2022-08-19 16:15:57,889 INFO ReplicaDB:54 Total process time: 7574ms
I used a piece of awk code to calculate the max length of each column, compare the result with the database table DDL. Seems all columns are within range. And the cvs was exported from the same source table..
A quick search leads me to https://stackoverflow.com/questions/10442686/received-an-invalid-column-length-from-the-bcp-client-for-colid-6, not sure if the debug code can be easily implemented in java though.
Hi @zhao-lih ,
I was testing with TINYINT data type (only 0-255 values) from CSV to TINYINT column in SQL Server and it works correctly.
Could you please share some lines from your source CSV file and the DDL of your sink table for testing.
Thanks.
@osalvador , there's no problem with "from CSV to TINYINT column in SQL Server", the issue is, "from sqlserver to csv", the column is translated into a Boolean value, "true/false", instead of a 0-255 value.
Hi @zhao-lih,
Are you using ReplicaDB to generate a CSV file from SQL Server? I just generated a CSV file from a table with the TINYINT column in SQL Server and the generated value is correct.
@osalvador , I did. Actually with both versions 11.8 and 12.2, not sure if it's something related to my environment.. For me it's not a blocking error, I can just fix it manually.
Not sure what we can do on the 2nd exception I posted above, as it doesn't tell which exact column is the offending one. I had similar issue with the bcp command on command line, until I used a format file, for both export and import.
2022-08-18 00:21:16,246 INFO ReplicaDB:63 Running ReplicaDB version: 0.12.1 2022-08-18 00:21:16,251 INFO ReplicaDB:67 Setting verbose mode 2022-08-18 00:21:16,251 DEBUG ReplicaDB:68 ToolOptions{ sourceConnect='file://home/zhaolih/mcs/mcs_uat.csv', sourceUser='null', sourcePassword='null', sourceTable='null', sourceColumns='dwelling_id, dwelling_type, civic_no, civic_suffix, unit_id, road_name, road_name, road_type, road_type_key, road_dir, road_dir_key, municipality_name, municipality_name_key, postal_code, prov_abvn, prov_code, csd_uid, csd_municipality_name, block_no, cu_id, is_civic_address', sourceWhere='null', sourceQuery='null', sinkConnect='jdbc:sqlserver://itars-as-dev01:1433;database=MCS;encrypt=false', sinkUser='mcs', sinkPassword='****', sinkTable='GEOSEARCH.VW_ADDRESS_DATA', sinkStagingTable='null', sinkStagingSchema='null', sinkStagingTableAlias='null', sinkColumns='dwelling_id, dwelling_type, civic_no, civic_suffix, unit_id , road_name, road_name_key, road_type, road_type_key, road_dir, road_dir_key, municipality_name, municipality_name_key, postal_code , prov_abvn, prov_code, csd_uid, csd_municipality_name, block_no, cu_id, is_civic_address', sinkDisableEscape=false, sinkDisableIndex=false, sinkDisableTruncate=false, sinkAnalyze=false, jobs=1, BandwidthThrottling=0, quotedIdentifiers=false, fetchSize=100, help=false, version=false, verbose=true, optionsFile='csv_sqlserver.conf', mode='complete', sentryDsn='null', sourceConnectionParams={columns.types=NTEGER, SMALLINT, INTEGER, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, VARCHAR, INTEGER, VARCHAR, VARCHAR, INTEGER, TINYINT, format.delimiter=:, format=DEFAULT}, sinkConnectionParams={domain=STC}, sourceFileFormat='null', sinkFileformat='null'} 2022-08-18 00:21:16,337 WARN FileManagerFactory:39 The file format is not defined, setting CSV as the default file format. 2022-08-18 00:21:16,428 INFO SqlManager:370 Truncating sink table with this command: TRUNCATE TABLE GEOSEARCH.VW_ADDRESS_DATA 2022-08-18 00:21:17,184 INFO ReplicaTask:36 Starting TaskId-0 2022-08-18 00:21:17,184 WARN FileManagerFactory:39 The file format is not defined, setting CSV as the default file format. 2022-08-18 00:21:17,230 DEBUG CsvFileManager:99 Setting the initial format to DEFAULT 2022-08-18 00:21:17,230 INFO CsvFileManager:198 The final CSVFormat is: Delimiter=<:> QuoteChar=<"> RecordSeparator=<