Closed alberttwong closed 8 months ago
Download sample data
wget https://github.com/datacharmer/test_db/releases/download/v1.0.7/test_db-1.0.7.tar.gz
atwong@Albert-CelerData sandbox % gunzip test_db-1.0.7.tar.gz
atwong@Albert-CelerData sandbox % tar xvf test_db-1.0.7.tar
x test_db/Changelog
x test_db/README.md
x test_db/employees.sql
x test_db/employees_partitioned.sql
x test_db/employees_partitioned_5.1.sql
x test_db/images/employees.gif
x test_db/images/employees.jpg
x test_db/images/employees.png
x test_db/load_departments.dump
x test_db/load_dept_emp.dump
x test_db/load_dept_manager.dump
x test_db/load_employees.dump
x test_db/load_salaries1.dump
x test_db/load_salaries2.dump
x test_db/load_salaries3.dump
x test_db/load_titles.dump
x test_db/objects.sql
x test_db/sakila/README.md
x test_db/sakila/sakila-mv-data.sql
x test_db/sakila/sakila-mv-schema.sql
x test_db/show_elapsed.sql
x test_db/sql_test.sh
x test_db/test_employees_md5.sql
x test_db/test_employees_sha.sql
x test_db/test_versions.sh
Source database data load
docker run --rm -p 9030:9030 -p 8030:8030 -p 8040:8040 -it starrocks/allin1-ubuntu
docker container run -d --name=LocalMySQLDB -p 3306:3306 -e MYSQL_ROOT_PASSWORD=password mysql
mysql -P 3306 -h 127.0.0.1 -u root -p --prompt="mysql > " < ./employees.sql
Target database setup
atwong@Albert-CelerData ~ % mysql -P 9030 -h 127.0.0.1 -u root --prompt="StarRocks > "
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 14
Server version: 5.1.0 3.2.2-269e832
Copyright (c) 2000, 2024, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
StarRocks > create database albert;
Query OK, 0 rows affected (0.02 sec)
Sling setup
atwong@Albert-CelerData test_db % sling conns set MYSQLLOCAL url=mysql://root:password@localhost:3306/employees
5:01PM INF connection `MYSQLLOCAL` has been set in /Users/atwong/.sling/env.yaml. Please test with `sling conns test MYSQLLOCAL`
atwong@Albert-CelerData test_db % sling conns test MYSQLLOCAL
5:01PM INF success!
atwong@Albert-CelerData Downloads % sling conns set STARROCKSLOCAL url=starrocks://root:@localhost:9030/albert
11:55AM INF connection `STARROCKSLOCAL` has been set in /Users/atwong/.sling/env.yaml. Please test with `sling conns test STARROCKSLOCAL`
atwong@Albert-CelerData Downloads % sling conns test STARROCKSLOCAL
11:55AM INF success!
Sling execution
atwong@Albert-CelerData test_db % sling run --src-conn MYSQLLOCAL --src-stream employees.employees --tgt-conn STARROCKSLOCAL --tgt-object albert.employees --tgt-options '{ table_keys: { primary: [ emp_no ], hash: [ emp_no ] } }'
postgres example
Download sample data
atwong@Albert-CelerData sandbox % wget https://www.postgresqltutorial.com/wp-content/uploads/2019/05/dvdrental.zip
--2024-02-07 17:44:55-- https://www.postgresqltutorial.com/wp-content/uploads/2019/05/dvdrental.zip
Resolving www.postgresqltutorial.com (www.postgresqltutorial.com)... 104.21.2.174, 172.67.129.129
Connecting to www.postgresqltutorial.com (www.postgresqltutorial.com)|104.21.2.174|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 550906 (538K) [application/zip]
Saving to: ‘dvdrental.zip’
dvdrental.zip 100%[=============================================================================================================>] 537.99K --.-KB/s in 0.03s
2024-02-07 17:44:55 (19.6 MB/s) - ‘dvdrental.zip’ saved [550906/550906]
Source database data load
atwong@Albert-CelerData dvdrental % docker run -itd -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres -p 5432:5432 --name postgresql postgres:latest
1f64e50aefb2cabf0d34b592226ac8950e1adb422021629e97069b775d0f45b8
atwong@Albert-CelerData dvdrental % PGPASSWORD=postgres psql -U postgres -h localhost
psql (14.10 (Homebrew), server 16.1 (Debian 16.1-1.pgdg120+1))
WARNING: psql major version 14, server major version 16.
Some psql features might not work.
Type "help" for help.
postgres=# create database dvdrental;
CREATE DATABASE
postgres=# exit
atwong@Albert-CelerData dvdrental % pg_restore -U postgres -d dvdrental -h localhost ./dvdrental.tar
Password:
atwong@Albert-CelerData dvdrental % PGPASSWORD=postgres psql -U postgres -h localhost
psql (14.10 (Homebrew), server 16.1 (Debian 16.1-1.pgdg120+1))
WARNING: psql major version 14, server major version 16.
Some psql features might not work.
Type "help" for help.
postgres=# \c dvdrental
psql (14.10 (Homebrew), server 16.1 (Debian 16.1-1.pgdg120+1))
WARNING: psql major version 14, server major version 16.
Some psql features might not work.
You are now connected to database "dvdrental" as user "postgres".
dvdrental=# \dt
List of relations
Schema | Name | Type | Owner
--------+---------------+-------+----------
public | actor | table | postgres
public | address | table | postgres
public | category | table | postgres
public | city | table | postgres
public | country | table | postgres
public | customer | table | postgres
public | film | table | postgres
public | film_actor | table | postgres
public | film_category | table | postgres
public | inventory | table | postgres
public | language | table | postgres
public | payment | table | postgres
public | rental | table | postgres
public | staff | table | postgres
public | store | table | postgres
(15 rows)
dvdrental=# \d+ public.staff
Table "public.staff"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
-------------+-----------------------------+-----------+----------+-----------------------------------------+----------+-------------+--------------+-------------
staff_id | integer | | not null | nextval('staff_staff_id_seq'::regclass) | plain | | |
first_name | character varying(45) | | not null | | extended | | |
last_name | character varying(45) | | not null | | extended | | |
address_id | smallint | | not null | | plain | | |
email | character varying(50) | | | | extended | | |
store_id | smallint | | not null | | plain | | |
active | boolean | | not null | true | plain | | |
username | character varying(16) | | not null | | extended | | |
password | character varying(40) | | | | extended | | |
last_update | timestamp without time zone | | not null | now() | plain | | |
picture | bytea | | | | extended | | |
Indexes:
"staff_pkey" PRIMARY KEY, btree (staff_id)
Foreign-key constraints:
"staff_address_id_fkey" FOREIGN KEY (address_id) REFERENCES address(address_id) ON UPDATE CASCADE ON DELETE RESTRICT
Referenced by:
TABLE "payment" CONSTRAINT "payment_staff_id_fkey" FOREIGN KEY (staff_id) REFERENCES staff(staff_id) ON UPDATE CASCADE ON DELETE RESTRICT
TABLE "rental" CONSTRAINT "rental_staff_id_key" FOREIGN KEY (staff_id) REFERENCES staff(staff_id)
TABLE "store" CONSTRAINT "store_manager_staff_id_fkey" FOREIGN KEY (manager_staff_id) REFERENCES staff(staff_id) ON UPDATE CASCADE ON DELETE RESTRICT
Triggers:
last_updated BEFORE UPDATE ON staff FOR EACH ROW EXECUTE FUNCTION last_updated()
Access method: heap
dvdrental=#
Target database setup
atwong@Albert-CelerData ~ % mysql -P 9030 -h 127.0.0.1 -u root --prompt="StarRocks > "
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 14
Server version: 5.1.0 3.2.2-269e832
Copyright (c) 2000, 2024, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
StarRocks > create database albert;
Query OK, 0 rows affected (0.02 sec)
Sling setup
atwong@Albert-CelerData dvdrental % sling conns set POSTGRESLOCAL url="postgresql://postgres:postgres@localhost:5432/dvdrental?sslmode=disable"
5:55PM INF connection `POSTGRESLOCAL` has been set in /Users/atwong/.sling/env.yaml. Please test with `sling conns test POSTGRESLOCAL`
atwong@Albert-CelerData dvdrental % sling conns test POSTGRESLOCAL
5:55PM INF success!
Sling execution to StarRocks primary key table
sling run -d --src-conn postgreslocal --src-stream public.staff --tgt-conn starrockslocal --tgt-object 'albert.staff' --tgt-options '{ table_keys: { primary: [ staff_id ], hash: [ staff_id ] } }'
Sling execution to StarRocks duplicate key table
sling run -d --src-conn postgreslocal --src-stream public.staff --tgt-conn starrockslocal --tgt-object 'albert.staff' --tgt-options '{ table_keys: { duplicate: [ staff_id ], hash: [ staff_id ] } }'
Example of doing the T part of ELT with StarRocks. https://github.com/slingdata-io/sling-cli/discussions/148
@flarco can we merge these examples into the docs?
Will do
See here: https://docs.slingdata.io/sling-cli/run/examples/additional-examples. Closing
Setup
Importing csv example1.csv
Importing json. example2.json.
Importing parquet.