juicedata / juicesync

A tool to move your data between any clouds or regions.
Apache License 2.0
591 stars 56 forks source link

能否支持一下传输失败时,抛出异常的功能 #50

Closed zhangkuantian closed 4 years ago

zhangkuantian commented 4 years ago

我们在传输oss文件到s3的时候,传输已经失败了,但是没有抛出异常,Succeed里还认为成功了,不清楚这是基于什么原因设计的,如果这个算正常功能的话,能否添加一下参数控制,允许这种传输失败时,抛出异常

下面是我们的报错日志:

2020/03/13 03:41:12.538091 <DEBUG>: Creating oss storage at endpoint https://xxxx.oss-cn-shanghai.aliyuncs.com
2020/03/13 03:41:12.538152 <DEBUG>: Creating s3 storage at endpoint https://xxxx.s3.cn-north-1.amazonaws.com.cn
2020/03/13 03:41:12.538249 <INFO>: Syncing from oss://xxxxxx/xxxx/xxx/xx/data_date=2020-03-12/ to s3://xxxxxx/xxxx/xxx/xx/data_date=2020-03-12/
2020/03/13 03:41:12.538255 <DEBUG>: maxResults: 10240, defaultPartSize: 5242880, maxBlock: 10485760
2020/03/13 03:41:12.538264 <DEBUG>: Listing objects from oss://xxxxxx/xxxx/xxx/xx/data_date=2020-03-12/ marker ""
2020/03/13 03:41:12.586283 <DEBUG>: Found 1 object from oss://xxxxxx/xxxx/xxx/xx/data_date=2020-03-12/ in 47.999306ms
2020/03/13 03:41:12.586797 <DEBUG>: Listing objects from s3://xxxxxx/xxxx/xxx/xx/data_date=2020-03-12/ marker ""
2020/03/13 03:41:12.587225 <DEBUG>: Continue listing objects from oss://xxxxxx/xxxx/xxx/xx/data_date=2020-03-12/ marker "data-00000-of-00001.tsv.gz"
2020/03/13 03:41:12.591886 <DEBUG>: Found 0 object from oss://xxxxxx/xxxx/xxx/xx/data_date=2020-03-12/ in 5.356395ms
2020/03/13 03:41:12.959493 <DEBUG>: Found 0 object from s3://xxxxxx/xxxx/xxx/xx/data_date=2020-03-12/ in 373.069188ms
2020/03/13 03:41:13.007916 <DEBUG>: Copying object data-00000-of-00001.tsv.gz as 55 parts (size: 5242880): EVf5BS8pLJIKHHDPgu8I.FWYHXZ4KW1E44vZfC3legCZ7m_k5RqsvNGlMcqEfjrQHeqk_X5DEQ7iT7FNHUk58zyKaqwmKuwdeISkWjWalxVTQ0EyLz2NBDcnF2Fl0CU7
2020/03/13 03:46:53.907529 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 33
2020/03/13 03:47:54.303755 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 52
2020/03/13 03:48:36.707127 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 26
2020/03/13 03:49:09.955661 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 28
2020/03/13 03:50:47.422721 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 20
2020/03/13 03:50:57.765377 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 53
2020/03/13 03:52:20.123703 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 38
2020/03/13 03:52:39.790954 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 46
2020/03/13 03:53:06.143590 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 10
2020/03/13 03:53:17.350579 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 54
2020/03/13 03:53:29.880241 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 36
2020/03/13 03:53:33.591953 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 1
2020/03/13 03:53:44.180034 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 50
2020/03/13 03:54:04.815573 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 22
2020/03/13 03:54:13.727680 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 25
2020/03/13 03:54:29.634812 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 45
2020/03/13 03:54:54.144061 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 8
2020/03/13 03:56:39.667419 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 41
2020/03/13 03:58:15.571480 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 30
2020/03/13 03:58:16.822260 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 16
2020/03/13 03:59:29.646285 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 14
2020/03/13 04:00:23.323447 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 24
2020/03/13 04:01:12.741026 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 40
2020/03/13 04:01:16.116275 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 19
2020/03/13 04:01:22.773482 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 48
2020/03/13 04:01:23.387022 <DEBUG>: Copied data-00000-of-00001.tsv.gz part 29
2020/03/13 04:02:17.003057 <WARNING>: Failed to copy data-00000-of-00001.tsv.gz part 43: RequestTimeout: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.
status code: 400, request id: D7422D8F25A99995, host id: HMmsZm3jrdbkrEAnCQUbEeGA3wyfTatUVAgkJ480Lo5y44TEFLZdjvFMTnQg5RhxT1pdLhCAQss=
2020/03/13 04:02:17.141562 <ERROR>: Failed to copy data-00000-of-00001.tsv.gz: multipart: part 43: RequestTimeout: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.
status code: 400, request id: D7422D8F25A99995, host id: HMmsZm3jrdbkrEAnCQUbEeGA3wyfTatUVAgkJ480Lo5y44TEFLZdjvFMTnQg5RhxT1pdLhCAQss=
2020/03/13 04:02:17.141589 <INFO>: Found: 1, copied: 0, deleted: 0, failed: 1
yujunz commented 4 years ago

我理解原来的设计是完成尽可能多的传输, @davies 可以确认一下。

因为同步通常是后台进行的,而且耗时较长,之间难免发生一些网络抖动的情况,所以应该容忍一些错误的发生。

”添加参数控制“的建议非常合理,新建了一个 Issue #51 。有兴趣提个 PR 实现下吗? @zhangkuantian

davies commented 4 years ago

我觉得可以 改一下最后退出时的 退出码,如果存在 失败,最终退出时 非 0 就可以。