taosdata / DataX

DataX是阿里云DataWorks数据集成的开源版本。涛思数据基于DataX,开发了TDengine的Writer和Reader插件,为用户提供ETL和数据迁移的工具。
Other
41 stars 9 forks source link

两个taos3.0之前迁移,都是本地的,用的是datax3.0,脏数据导致数据同步失败 #41

Open yanjunwan12138 opened 1 year ago

yanjunwan12138 commented 1 year ago

对应错误

2023-06-08 17:45:35.245 [job-0] INFO StandAloneJobContainerCommunicator - Total 0 records, 0 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 0.00% 2023-06-08 17:45:45.248 [job-0] INFO StandAloneJobContainerCommunicator - Total 10000 records, 256450 bytes | Speed 25.04KB/s, 1000 records/s | Error 2000 records, 51270 bytes | All Task WaitWriterTime 0.006s | All Task WaitReaderTime 0.338s | Percentage 0.00% 2023-06-08 17:45:45.249 [job-0] ERROR JobContainer - 运行scheduler 模式[standalone]出错. 2023-06-08 17:45:45.249 [job-0] ERROR JobContainer - Exception when job run com.alibaba.datax.common.exception.DataXException: Code:[Framework-14], Description:[DataX传输脏数据超过用户预期,该错误通常是由于源端数据存在较多业务脏数据导致,请仔细检查DataX汇报的脏数据日志信息, 或者您可以适当调大脏数据阈值 .]. - 脏数据条数检查不通过,限制是[0]条,但实际上捕获了[2000]条. at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:30) ~[datax-common-0.0.1-SNAPSHOT.jar:na] at com.alibaba.datax.core.util.ErrorRecordChecker.checkRecordLimit(ErrorRecordChecker.java:58) ~[datax-core-0.0.1-SNAPSHOT.jar:na] at com.alibaba.datax.core.job.scheduler.AbstractScheduler.schedule(AbstractScheduler.java:89) ~[datax-core-0.0.1-SNAPSHOT.jar:na] at com.alibaba.datax.core.job.JobContainer.schedule(JobContainer.java:535) ~[datax-core-0.0.1-SNAPSHOT.jar:na] at com.alibaba.datax.core.job.JobContainer.start(JobContainer.java:119) ~[datax-core-0.0.1-SNAPSHOT.jar:na] at com.alibaba.datax.core.Engine.start(Engine.java:93) [datax-core-0.0.1-SNAPSHOT.jar:na] at com.alibaba.datax.core.Engine.entry(Engine.java:175) [datax-core-0.0.1-SNAPSHOT.jar:na] at com.alibaba.datax.core.Engine.main(Engine.java:208) [datax-core-0.0.1-SNAPSHOT.jar:na] 2023-06-08 17:45:45.250 [job-0] INFO StandAloneJobContainerCommunicator - Total 10000 records, 256450 bytes | Speed 250.44KB/s, 10000 records/s | Error 2000 records, 51270 bytes | All Task WaitWriterTime 0.006s | All Task WaitReaderTime 0.338s | Percentage 0.00% 2023-06-08 17:45:45.252 [job-0] ERROR Engine -

经DataX智能分析,该任务最可能的错误原因是: com.alibaba.datax.common.exception.DataXException: Code:[Framework-14], Description:[DataX传输脏数据超过用户预期,该错误通常是由于源端数据存在较多业务脏数据导致,请仔细检查DataX汇报的脏数据日志信息, 或者您可以适当调大脏数据阈值 .]. - 脏数据条数检查不通过,限制是[0]条,但实际上捕获了[2000]条. at com.alibaba.datax.common.exception.DataXException.asDataXException(DataXException.java:30) at com.alibaba.datax.core.util.ErrorRecordChecker.checkRecordLimit(ErrorRecordChecker.java:58) at com.alibaba.datax.core.job.scheduler.AbstractScheduler.schedule(AbstractScheduler.java:89) at com.alibaba.datax.core.job.JobContainer.schedule(JobContainer.java:535) at com.alibaba.datax.core.job.JobContainer.start(JobContainer.java:119) at com.alibaba.datax.core.Engine.start(Engine.java:93) at com.alibaba.datax.core.Engine.entry(Engine.java:175) at com.alibaba.datax.core.Engine.main(Engine.java:208)

对应job.json

{ "job":{ "content":[ { "reader":{ "name":"tdengine30reader", "parameter":{ "username":"root", "password":"taosdata", "connection":[ { "table":[ "weather" ], "jdbcUrl":[ "jdbc:TAOS-RS://127.0.0.1:6041/wanyanjun?timestampFormat=TIMESTAMP" ] } ], "column":[ "ts", "temperature", "humidity", "location", "groupid" ] } }, "writer":{ "name":"tdengine30writer", "parameter":{ "username":"root", "password":"taosdata", "column":[ "ts", "temperature", "humidity", "location", "groupid" ], "connection":[ { "table":[ "weather" ], "jdbcUrl":"jdbc:TAOS-RS://192.168.3.212:6041/wanyanjun" } ], "encoding":"UTF-8", "batchSize":1000, "ignoreTagsUnmatched":true } } } ], "setting":{ "speed":{ "channel":5 }, "errorLimit":{ "record":0 } } } }

yu285 commented 1 year ago

我们看下