slingdata-io / sling-cli

Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
https://docs.slingdata.io
GNU General Public License v3.0
405 stars 29 forks source link

Logging Table in Target #94

Open GJMcClintock opened 9 months ago

GJMcClintock commented 9 months ago

A helpful addition would be a logging table that had information about each run for database targets:

ShahBinoy commented 9 months ago

Since I am not sure if I should create a new issue or add to this suggestion, but the current way progress of records being fetched is shown on terminal it is not friendly for Log aggregators or even log monitoring systems

It appears like below:

2024-01-02 15:55:48 DBG using source options: {"empty_as_null":true,"null_if":"NULL","datetime_format":"AUTO","max_decimals":-1}
2024-01-02 15:55:48 DBG using target options: {"header":true,"compression":"snappy","concurrency":12,"datetime_format":"auto","delimiter":",","file_max_rows":100000,"format":"parquet","max_decimals":-1,"use_bulk":true,"add_new_columns":true,"adjust_column_type":false,"column_casing":"source"}
2024-01-02 15:55:48 INF connecting to source database (mysql)
2024-01-02 15:55:48 INF reading from source database
2024-01-02 15:55:48 DBG select * from `acadia`.`jhi_user`
2024-01-02 15:55:48 DBG column type change for activated (smallint to string)
2024-01-02 15:55:48 DBG column type change for locked (smallint to string)
2024-01-02 15:55:48 DBG column type change for test_user (smallint to string)
2024-01-02 15:55:48 DBG column type change for registration_complete (smallint to string)
2024-01-02 15:55:48 INF writing to target file system (s3)
2024-01-02 15:55:48 DBG writing to s3://bucket/path/user/2024_01_02/part.01 [fileRowLimit=100000 fileBytesLimit=0 compression=SNAPPY concurrency=7 useBufferedStream=false fileFormat=parquet]
1s 20,362 ? p/s  | 0 B/s | 73% MEM | 8% CPU 2s 68,654 48293 r/s  | 0 B/s | 73% MEM | 37% CPU 3s 106,962 47648 r/s  | 0 B/s | 73% MEM | 42% CPU 4s 155,463 47702 r/s  | 0 B/s | 73% MEM | 47% CPU 5s 200,000 47499 r/s  | 0 B/s | 73% MEM | 43% CPU 6s 239,400 46977 r/s  | 0 B/s | 73% MEM | 39% CPU 7s 287,857 47072 r/s  | 0 B/s | 73% MEM | 38% CPU 8s 324,356 46390 r/s  | 0 B/s | 73% MEM | 38% CPU 9s 373,792 46586 r/s  | 0 B/s | 73% MEM | 38% CPU 10s 411,850 46036 r/s  | 0 B/s | 73% MEM | 42% CPU 11s 459,775 46157 r/s  | 0 B/s | 73% MEM | 44% CPU 12s 500,000 45775 r/s  | 0 B/s | 73% MEM | 42% CPU 13s 547,106 45861 r/s  | 0 B/s | 73% MEM | 36% CPU 14s 596,311 46077 r/s  | 0 B/s | 74% MEM | 40% CPU 15s 631,800 45393 r/s  | 0 B/s | 74% MEM | 28% CPU 2024-01-02 15:56:04 DBG wrote 54 MB: 653434 rows [39,692 r/s]
15s 653,434 45182 r/s  | 0 B/s | 73% MEM | 21% CPU 2024-01-02 15:56:04 INF wrote 653434 rows [39,692 r/s]
2024-01-02 15:56:04 INF execution succeeded

Instead if the details as requested by @GJMcClintock are shown at start and also add each progress line item in a new line with progress kind of keyword, it becomes log aggregator friendly.

This is not a really big deal though, some additional interception rules with filtering might be able to take care of it.

So its just a nice to have 👍🏼

flarco commented 9 months ago

Hi, so this is mostly already being done in the $SLING_HOME_DIR/.sling.db file (default for $SLING_HOME_DIR is ~/.sling). It is however a SQLite file.

You can see the executions, tasks & replications table structures here: https://github.com/slingdata-io/sling-cli/blob/main/core/store/store.go

Need to think of a way to best translate that function to insert into tables in the destination DB... thinking of capturing a schema name, probably as an environment variable, so that sling can write the 3 tables in there. Thoughts?

ShahBinoy commented 9 months ago

Hi, so this is mostly already being done in the $SLING_HOME_DIR/.sling.db file (default for $SLING_HOME_DIR is ~/.sling). It is however a SQLite file.

You can see the executions, tasks & replications table structures here: https://github.com/slingdata-io/sling-cli/blob/main/core/store/store.go

Need to think of a way to best translate that function to insert into tables in the destination DB... thinking of capturing a schema name, probably as an environment variable, so that sling can write the 3 tables in there. Thoughts?

I concur, I am currently using Meltano and building some foundation to switch to Sling. I think persisting the output to a proper DB will atleast open doors to enterprise level observability. It can be optional for anybody who needs it. Meltano does that too, default storage is in SQLite but has possibility to insert into Postgres if configured.