slingdata-io / sling-cli

Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
https://docs.slingdata.io
GNU General Public License v3.0
438 stars 34 forks source link

Add more usage examples to StarRocks #140

Closed alberttwong closed 8 months ago

alberttwong commented 9 months ago

Setup

atwong@Albert-CelerData Downloads % sling conns set STARROCKSLOCAL url=starrocks://root:@localhost:9030/
11:55AM INF connection `STARROCKSLOCAL` has been set in /Users/atwong/.sling/env.yaml. Please test with `sling conns test STARROCKSLOCAL`
atwong@Albert-CelerData Downloads % sling conns test STARROCKSLOCAL
11:55AM INF success!

Importing csv example1.csv

atwong@Albert-CelerData auxjars % sling run --src-stream 'file:///Users/atwong/Downloads/example1.csv' --tgt-conn STARROCKSLOCAL --tgt-object 'albert.call_center2' --mode full-refresh --src-options '{"header":false}' --primary-key col_001
11:24AM INF connecting to target database (starrocks)
11:24AM INF reading from source file system (file)
11:24AM INF writing to target database [mode: full-refresh]
11:24AM INF streaming data
11:24AM WRN Using INSERT mode which is meant for small datasets. Please set the `fe_url` for loading large datasets via Stream Load mode. See https://docs.slingdata.io/connections/database-connections/starrocks
11:24AM INF dropped table `albert`.`call_center2`
11:24AM INF created table `albert`.`call_center2`
11:24AM INF inserted 4 rows into `albert`.`call_center2` in 0 secs [7 r/s]
11:24AM INF execution succeeded

Importing json. example2.json.

[!Note]
There is an outstanding issue with certain JSON files. https://github.com/StarRocks/starrocks/issues/40925. Sling wrote code as a workaround.

atwong@Albert-CelerData auxjars % sling run --src-stream 'file:///Users/atwong/Downloads/example2.json' --tgt-conn STARROCKSLOCAL --tgt-object 'albert.call_center4' --mode full-refresh --src-options '{"flatten":true}' --primary-key code
11:30AM INF connecting to target database (starrocks)
11:30AM INF reading from source file system (file)
11:30AM INF writing to target database [mode: full-refresh]
11:30AM INF streaming data
11:30AM WRN Using INSERT mode which is meant for small datasets. Please set the `fe_url` for loading large datasets via Stream Load mode. See https://docs.slingdata.io/connections/database-connections/starrocks
11:30AM INF dropped table `albert`.`call_center4`
11:30AM INF created table `albert`.`call_center4`
11:30AM INF inserted 1 rows into `albert`.`call_center4` in 0 secs [2 r/s]
11:30AM INF execution succeeded

Importing parquet.

atwong@Albert-CelerData auxjars % sling run --src-stream 'file:///Users/atwong/sandbox/tpcds-parquet/call_center.parquet' --tgt-conn STARROCKSLOCAL --tgt-object 'albert.call_center' --mode full-refresh --primary-key cc_call_center_sk
1:28PM INF connecting to target database (starrocks)
1:28PM INF reading from source file system (file)
1:28PM INF writing to target database [mode: full-refresh]
1:28PM INF streaming data
1:28PM WRN Using INSERT mode which is meant for small datasets. Please set the `fe_url` for loading large datasets via Stream Load mode. See https://docs.slingdata.io/connections/database-connections/starrocks
1:28PM INF dropped table `albert`.`call_center`
1:28PM INF created table `albert`.`call_center`
1:28PM INF inserted 6 rows into `albert`.`call_center` in 1 secs [6 r/s]
1:28PM INF execution succeeded
alberttwong commented 9 months ago

mysql example

Download sample data

wget https://github.com/datacharmer/test_db/releases/download/v1.0.7/test_db-1.0.7.tar.gz
atwong@Albert-CelerData sandbox % gunzip test_db-1.0.7.tar.gz
atwong@Albert-CelerData sandbox % tar xvf test_db-1.0.7.tar
x test_db/Changelog
x test_db/README.md
x test_db/employees.sql
x test_db/employees_partitioned.sql
x test_db/employees_partitioned_5.1.sql
x test_db/images/employees.gif
x test_db/images/employees.jpg
x test_db/images/employees.png
x test_db/load_departments.dump
x test_db/load_dept_emp.dump
x test_db/load_dept_manager.dump
x test_db/load_employees.dump
x test_db/load_salaries1.dump
x test_db/load_salaries2.dump
x test_db/load_salaries3.dump
x test_db/load_titles.dump
x test_db/objects.sql
x test_db/sakila/README.md
x test_db/sakila/sakila-mv-data.sql
x test_db/sakila/sakila-mv-schema.sql
x test_db/show_elapsed.sql
x test_db/sql_test.sh
x test_db/test_employees_md5.sql
x test_db/test_employees_sha.sql
x test_db/test_versions.sh

Source database data load

docker run --rm -p 9030:9030 -p 8030:8030 -p 8040:8040 -it starrocks/allin1-ubuntu
docker container run -d --name=LocalMySQLDB -p 3306:3306 -e MYSQL_ROOT_PASSWORD=password mysql
mysql -P 3306 -h 127.0.0.1 -u root -p --prompt="mysql > " < ./employees.sql

Target database setup

atwong@Albert-CelerData ~ % mysql -P 9030 -h 127.0.0.1 -u root --prompt="StarRocks > "
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 14
Server version: 5.1.0 3.2.2-269e832

Copyright (c) 2000, 2024, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

StarRocks > create database albert;
Query OK, 0 rows affected (0.02 sec)

Sling setup

atwong@Albert-CelerData test_db % sling conns set MYSQLLOCAL url=mysql://root:password@localhost:3306/employees
5:01PM INF connection `MYSQLLOCAL` has been set in /Users/atwong/.sling/env.yaml. Please test with `sling conns test MYSQLLOCAL`
atwong@Albert-CelerData test_db % sling conns test MYSQLLOCAL
5:01PM INF success!
atwong@Albert-CelerData Downloads % sling conns set STARROCKSLOCAL url=starrocks://root:@localhost:9030/albert
11:55AM INF connection `STARROCKSLOCAL` has been set in /Users/atwong/.sling/env.yaml. Please test with `sling conns test STARROCKSLOCAL`
atwong@Albert-CelerData Downloads % sling conns test STARROCKSLOCAL
11:55AM INF success!

Sling execution

atwong@Albert-CelerData test_db % sling run --src-conn MYSQLLOCAL --src-stream employees.employees --tgt-conn STARROCKSLOCAL --tgt-object albert.employees --tgt-options '{ table_keys: { primary: [ emp_no ], hash: [ emp_no ] } }'
alberttwong commented 9 months ago

postgres example

Download sample data

atwong@Albert-CelerData sandbox % wget https://www.postgresqltutorial.com/wp-content/uploads/2019/05/dvdrental.zip
--2024-02-07 17:44:55--  https://www.postgresqltutorial.com/wp-content/uploads/2019/05/dvdrental.zip
Resolving www.postgresqltutorial.com (www.postgresqltutorial.com)... 104.21.2.174, 172.67.129.129
Connecting to www.postgresqltutorial.com (www.postgresqltutorial.com)|104.21.2.174|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 550906 (538K) [application/zip]
Saving to: ‘dvdrental.zip’

dvdrental.zip                                      100%[=============================================================================================================>] 537.99K  --.-KB/s    in 0.03s

2024-02-07 17:44:55 (19.6 MB/s) - ‘dvdrental.zip’ saved [550906/550906]

Source database data load

atwong@Albert-CelerData dvdrental % docker run -itd -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres -p 5432:5432 --name postgresql postgres:latest
1f64e50aefb2cabf0d34b592226ac8950e1adb422021629e97069b775d0f45b8
atwong@Albert-CelerData dvdrental % PGPASSWORD=postgres psql -U postgres -h localhost
psql (14.10 (Homebrew), server 16.1 (Debian 16.1-1.pgdg120+1))
WARNING: psql major version 14, server major version 16.
         Some psql features might not work.
Type "help" for help.

postgres=# create database dvdrental;
CREATE DATABASE
postgres=# exit
atwong@Albert-CelerData dvdrental % pg_restore -U postgres -d dvdrental -h localhost ./dvdrental.tar
Password:
atwong@Albert-CelerData dvdrental % PGPASSWORD=postgres psql -U postgres -h localhost
psql (14.10 (Homebrew), server 16.1 (Debian 16.1-1.pgdg120+1))
WARNING: psql major version 14, server major version 16.
         Some psql features might not work.
Type "help" for help.

postgres=# \c dvdrental
psql (14.10 (Homebrew), server 16.1 (Debian 16.1-1.pgdg120+1))
WARNING: psql major version 14, server major version 16.
         Some psql features might not work.
You are now connected to database "dvdrental" as user "postgres".
dvdrental=# \dt
             List of relations
 Schema |     Name      | Type  |  Owner
--------+---------------+-------+----------
 public | actor         | table | postgres
 public | address       | table | postgres
 public | category      | table | postgres
 public | city          | table | postgres
 public | country       | table | postgres
 public | customer      | table | postgres
 public | film          | table | postgres
 public | film_actor    | table | postgres
 public | film_category | table | postgres
 public | inventory     | table | postgres
 public | language      | table | postgres
 public | payment       | table | postgres
 public | rental        | table | postgres
 public | staff         | table | postgres
 public | store         | table | postgres
(15 rows)
dvdrental=# \d+ public.staff
                                                                       Table "public.staff"
   Column    |            Type             | Collation | Nullable |                 Default                 | Storage  | Compression | Stats target | Description
-------------+-----------------------------+-----------+----------+-----------------------------------------+----------+-------------+--------------+-------------
 staff_id    | integer                     |           | not null | nextval('staff_staff_id_seq'::regclass) | plain    |             |              |
 first_name  | character varying(45)       |           | not null |                                         | extended |             |              |
 last_name   | character varying(45)       |           | not null |                                         | extended |             |              |
 address_id  | smallint                    |           | not null |                                         | plain    |             |              |
 email       | character varying(50)       |           |          |                                         | extended |             |              |
 store_id    | smallint                    |           | not null |                                         | plain    |             |              |
 active      | boolean                     |           | not null | true                                    | plain    |             |              |
 username    | character varying(16)       |           | not null |                                         | extended |             |              |
 password    | character varying(40)       |           |          |                                         | extended |             |              |
 last_update | timestamp without time zone |           | not null | now()                                   | plain    |             |              |
 picture     | bytea                       |           |          |                                         | extended |             |              |
Indexes:
    "staff_pkey" PRIMARY KEY, btree (staff_id)
Foreign-key constraints:
    "staff_address_id_fkey" FOREIGN KEY (address_id) REFERENCES address(address_id) ON UPDATE CASCADE ON DELETE RESTRICT
Referenced by:
    TABLE "payment" CONSTRAINT "payment_staff_id_fkey" FOREIGN KEY (staff_id) REFERENCES staff(staff_id) ON UPDATE CASCADE ON DELETE RESTRICT
    TABLE "rental" CONSTRAINT "rental_staff_id_key" FOREIGN KEY (staff_id) REFERENCES staff(staff_id)
    TABLE "store" CONSTRAINT "store_manager_staff_id_fkey" FOREIGN KEY (manager_staff_id) REFERENCES staff(staff_id) ON UPDATE CASCADE ON DELETE RESTRICT
Triggers:
    last_updated BEFORE UPDATE ON staff FOR EACH ROW EXECUTE FUNCTION last_updated()
Access method: heap

dvdrental=#

Target database setup

atwong@Albert-CelerData ~ % mysql -P 9030 -h 127.0.0.1 -u root --prompt="StarRocks > "
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 14
Server version: 5.1.0 3.2.2-269e832

Copyright (c) 2000, 2024, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

StarRocks > create database albert;
Query OK, 0 rows affected (0.02 sec)

Sling setup

atwong@Albert-CelerData dvdrental % sling conns set POSTGRESLOCAL url="postgresql://postgres:postgres@localhost:5432/dvdrental?sslmode=disable"
5:55PM INF connection `POSTGRESLOCAL` has been set in /Users/atwong/.sling/env.yaml. Please test with `sling conns test POSTGRESLOCAL`
atwong@Albert-CelerData dvdrental % sling conns test POSTGRESLOCAL
5:55PM INF success!

Sling execution to StarRocks primary key table

sling run -d --src-conn postgreslocal --src-stream public.staff --tgt-conn starrockslocal --tgt-object 'albert.staff' --tgt-options '{ table_keys: { primary: [ staff_id ], hash: [ staff_id ] } }'

Sling execution to StarRocks duplicate key table

sling run -d --src-conn postgreslocal --src-stream public.staff --tgt-conn starrockslocal --tgt-object 'albert.staff' --tgt-options '{ table_keys: { duplicate: [ staff_id ], hash: [ staff_id ] } }'
alberttwong commented 9 months ago

Example of doing the T part of ELT with StarRocks. https://github.com/slingdata-io/sling-cli/discussions/148

alberttwong commented 8 months ago

@flarco can we merge these examples into the docs?

flarco commented 8 months ago

Will do

flarco commented 8 months ago

See here: https://docs.slingdata.io/sling-cli/run/examples/additional-examples. Closing