timeplus-io / sling-cli

Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
https://docs.slingdata.io
GNU General Public License v3.0
1 stars 0 forks source link

Got no data warning when import large data #20

Closed shirley-gu-timeplus closed 1 month ago

shirley-gu-timeplus commented 1 month ago

Issue Description

SELECT count() FROM table(t_metrics)

Query id: de234398-6895-412b-8ef1-d837bebecf00

┌────count()─┐ │ 1692807396 │ └────────────┘

1 row in set. Elapsed: 0.003 sec.

no data in target stream

timeplusd :) select count() from table(gj_metrics)

SELECT count() FROM table(gj_metrics)

Query id: 243e1c13-02ac-45ed-8c74-29722ff6918d

┌─count()─┐ │ 0 │ └─────────┘

1 row in set. Elapsed: 0.004 sec.

- Sling version (`sling --version`): 
1.0.7
- Operating System (`linux`, `mac`, `windows`): 
linux
- Replication Configuration: 

```yaml
source: DAILY_TIMEPLUS
target: HISTORICAL_TIMEPLUS

defaults:
  mode: incremental

streams:
  rtdc_exchange_order:
    update_key: _tp_time
    object: gj_rtdc_exchange_order
  rtdc_execution:
    update_key: _tp_time
    object: gj_rtdc_execution
  rtdc_position:
    update_key: _tp_time
    object: gj_rtdc_position
  rtdc_fund:
    update_key: _tp_time
    object: gj_rtdc_fund
  t_metrics:
    update_key: _tp_time
    object: gj_metrics
[root@node0 sling]# ./sling-timeplus run -r daily_to_his.yaml --streams "t_metrics" -d
WARN[0000]log.go:244 gosnowflake.(*defaultLogger).Warn DBUS_SESSION_BUS_ADDRESS envvar looks to be not set, this can lead to runaway dbus-daemon processes. To avoid this, set envvar DBUS_SESSION_BUS_ADDRESS=$XDG_RUNTIME_DIR/bus (if it exists) or DBUS_SESSION_BUS_ADDRESS=/dev/null.
2024-10-15 19:25:09 INF Sling Replication [1 streams] | DAILY_TIMEPLUS -> HISTORICAL_TIMEPLUS

2024-10-15 19:25:09 INF [1 / 1] running stream t_metrics
2024-10-15 19:25:09 DBG Sling version: 1.0.7 (linux amd64)
2024-10-15 19:25:09 DBG type is db-db
2024-10-15 19:25:09 DBG using: {"columns":null,"mode":"incremental","transforms":null}
2024-10-15 19:25:09 DBG using source options: {"empty_as_null":false,"null_if":"NULL","datetime_format":"AUTO","max_decimals":11}
2024-10-15 19:25:09 DBG using target options: {"batch_limit":5000,"datetime_format":"auto","file_max_rows":0,"max_decimals":11,"use_bulk":true,"add_new_columns":true,"adjust_column_type":false,"column_casing":"source"}
2024-10-15 19:25:09 DBG opened "proton" connection (conn-proton-46Q)
2024-10-15 19:25:09 DBG opened "proton" connection (conn-proton-1U5)
2024-10-15 19:25:09 INF connecting to source database (proton)
2024-10-15 19:25:09 INF connecting to target database (proton)
2024-10-15 19:25:09 INF getting checkpoint value
2024-10-15 19:25:09 DBG select max(`_tp_time`) as max_val from table(`default`.`gj_metrics`)
2024-10-15 19:25:09 INF reading from source database
2024-10-15 19:25:09 DBG select * from table(`default`.`t_metrics`) where `_tp_time` > to_time('1970-01-01 00:00:00.000000 +00') order by `_tp_time` asc
2024-10-15 19:25:17 INF writing to target database [mode: incremental]
2024-10-15 19:25:17 DBG drop stream if exists `default`.`gj_metrics_tmp`
2024-10-15 19:25:17 DBG table `default`.`gj_metrics_tmp` dropped
2024-10-15 19:25:17 DBG create stream `default`.`gj_metrics_tmp` (`event_ts` nullable(decimal(28,0)),
`metric` nullable(string),
`value` nullable(string),
`metric_date` nullable(int64),
`metric_time` nullable(int64),
`metric_datetime` nullable(int64),
`tagK1` nullable(string),
`tagV1` nullable(string),
`tagK2` nullable(string),
`tagV2` nullable(string),
`tagK3` nullable(string),
`tagV3` nullable(string),
`tagK4` nullable(string),
`tagV4` nullable(string),
`tagK5` nullable(string),
`tagV5` nullable(string),
`_tp_time` datetime64(3, 'UTC') DEFAULT now64(3, 'UTC') CODEC(DoubleDelta, LZ4))
2024-10-15 19:25:19 INF streaming data
2024-10-15 19:25:19 DBG use `default`
2024-10-15 19:25:19 INF Bulk import completed: 1 batches, 0 rows
2024-10-15 19:25:19 DBG 0 ROWS COPIED
2024-10-15 19:25:24 DBG select count(*) as cnt from table(`default`.`gj_metrics_tmp`)
2024-10-15 19:25:24 WRN No data or records found in stream. Nothing to do. To allow Sling to create empty tables, set SLING_ALLOW_EMPTY=TRUE
2024-10-15 19:25:24 INF inserted 0 rows into default.`gj_metrics` in 14 secs [0 r/s]
2024-10-15 19:25:24 DBG drop stream if exists `default`.`gj_metrics_tmp`
2024-10-15 19:25:24 DBG table `default`.`gj_metrics_tmp` dropped
2024-10-15 19:25:24 DBG closed "proton" connection (conn-proton-1U5)
2024-10-15 19:25:24 INF execution succeeded

2024-10-15 19:25:24 INF Sling Replication Completed in 14s | DAILY_TIMEPLUS -> HISTORICAL_TIMEPLUS | 1 Successes | 0 Failures
yokofly commented 1 month ago
timeplusd :) select * from table(`default`.`t_metrics`) where `_tp_time` > to_time('1970-01-01 00:00:00.000000 +00') order by `_tp_time` asc limit 2;

SELECT
  *
FROM
  table(default.t_metrics)
WHERE
  _tp_time > to_time('1970-01-01 00:00:00.000000 +00')
ORDER BY
  _tp_time ASC
LIMIT 2 

Query id: 3c0e4521-3f34-4405-932b-87d156a3104a

↑ Progress: 840.04 million rows, 201.55 GB (29.88 million rows/s., 7.17 GB/s.) ███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                                                                                                                                               (11.4 CPU, 1.28 GB RAM)↗ Progress: 840.04 million rows, 201.55 GB (29.88 million rows/s., 7.17 GB/s.) ███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                                                                                                                                                               (11.9 CPU, 1.28 GB RAM)→ Progress: 842.91 million rows, 202.24 GB (29.87 million rows/s., 7.17 GB/s.) ████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                                                                                                                                                              (11.3 CPU, 1.28 GB RAM)↘ Progress: 842.91 million rows, 202.24 GB (29.87 million rows/s., 7.17 GB/s.) ████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                                                                                                                                                              (11.7 CPU, 1.29 GB RAM)

2 rows in set. Elapsed: 61.632 sec. Processed 1.82 billion rows, 437.65 GB (29.60 million rows/s., 7.10 GB/s.)

timeplusd :) 
yokofly commented 1 month ago

i can reproduce locally, probably related to high vol data

ubuntu@ip-172-31-12-145:~$ ./sling run  --update-key _tp_time  --src-conn PROTON --src-stream 't_metrics'  --tgt-conn PROTON --tgt-object 'new_metrics'  --mode incremental -d
2024-10-15 15:16:15 DBG Force SLING_PROCESS_BW to false for timeplus database
2024-10-15 15:16:15 DBG Sling version: 2.0.3-rc1-timeplus (linux arm64)
2024-10-15 15:16:15 DBG type is db-db
2024-10-15 15:16:15 DBG using: {"columns":null,"mode":"incremental","transforms":null}
2024-10-15 15:16:15 DBG using source options: {"empty_as_null":false,"null_if":"NULL","datetime_format":"AUTO","max_decimals":11}
2024-10-15 15:16:15 DBG using target options: {"batch_limit":5000,"datetime_format":"auto","file_max_rows":0,"max_decimals":11,"use_bulk":true,"add_new_columns":true,"adjust_column_type":false,"column_casing":"source"}
2024-10-15 15:16:15 DBG opened "proton" connection (conn-proton-GGn)
2024-10-15 15:16:15 DBG opened "proton" connection (conn-proton-5rg)
2024-10-15 15:16:15 INF connecting to source database (proton)
2024-10-15 15:16:15 INF connecting to target database (proton)
2024-10-15 15:16:15 INF getting checkpoint value
2024-10-15 15:16:15 DBG select max(`_tp_time`) as max_val from table(`default`.`new_metrics`)
2024-10-15 15:16:15 INF reading from source database
2024-10-15 15:16:15 DBG select * from table(`default`.`t_metrics`) where `_tp_time` > to_time('1970-01-01 00:00:00.000000 +00') order by `_tp_time` asc
2024-10-15 15:16:22 INF writing to target database [mode: incremental]
2024-10-15 15:16:22 DBG drop stream if exists `default`.`new_metrics_tmp`
2024-10-15 15:16:22 DBG table `default`.`new_metrics_tmp` dropped
2024-10-15 15:16:22 DBG create stream `default`.`new_metrics_tmp` (`event_ts` nullable(decimal(28,0)),
`metric` nullable(string),
`value` nullable(string),
`metric_date` nullable(int64),
`metric_time` nullable(int64),
`metric_datetime` nullable(int64),
`tagK1` nullable(string),
`tagV1` nullable(string),
`tagK2` nullable(string),
`tagV2` nullable(string),
`tagK3` nullable(string),
`tagV3` nullable(string),
`tagK4` nullable(string),
`tagV4` nullable(string),
`tagK5` nullable(string),
`tagV5` nullable(string),
`_tp_time` datetime64(3, 'UTC') DEFAULT now64(3, 'UTC') CODEC(DoubleDelta, LZ4))
2024-10-15 15:16:24 INF streaming data
2024-10-15 15:16:24 DBG use `default`
2024-10-15 15:16:24 DBG 0 ROWS COPIED
2024-10-15 15:16:29 DBG select count(*) as cnt from table(`default`.`new_metrics_tmp`)
2024-10-15 15:16:29 WRN No data or records found in stream. Nothing to do. To allow Sling to create empty tables, set SLING_ALLOW_EMPTY=TRUE
2024-10-15 15:16:29 INF inserted 0 rows into default.`new_metrics` in 14 secs [0 r/s] 
2024-10-15 15:16:29 DBG drop stream if exists `default`.`new_metrics_tmp`
2024-10-15 15:16:29 DBG table `default`.`new_metrics_tmp` dropped
2024-10-15 15:16:29 DBG closed "proton" connection (conn-proton-5rg)
2024-10-15 15:16:29 DBG closed "proton" connection (conn-proton-GGn)
2024-10-15 15:16:29 INF execution succeeded
ubuntu@ip-172-31-12-145:~$ ./timeplus/bin/timeplusd client -q "select count() from table(t_metrics)"
16570318000
ubuntu@ip-172-31-12-145:~$ 
yokofly commented 1 month ago

v likely fixed in #25