Closed cosmok closed 7 years ago
Hello @cosmok
I'm not familiar about the Redshift.
But you may need to write database password in ~/.config/digdag/config
.
(When you use local mode).
If you use server mode, you need to setup secrets
For example
~/.config/digdag/config
secrets.pg.password = your_postgresql_password_here
digdag init -t postgresql pg_test
2016-12-14 12:52:37 +0900: Digdag v0.8.22
Creating pg_test/queries/create_src_table.sql
Creating pg_test/queries/insert_data_to_src_table.sql
Creating pg_test/queries/summarize_src_table.sql
Creating pg_test/pg_test.dig
Creating pg_test/.gitignore
modify connection part.
_export:
pg:
user: digdag
host: 127.0.0.1
database: digdag_test
digdag run pg_test.dig -a
2016-12-14 12:56:12 +0900: Digdag v0.8.22
2016-12-14 12:56:14 +0900 [WARN] (main): Reusing the last session time 2016-12-14T00:00:00+00:00.
2016-12-14 12:56:14 +0900 [INFO] (main): Using session /private/tmp/pg_test/.digdag/status/20161214T000000+0000.
2016-12-14 12:56:14 +0900 [INFO] (main): Starting a new session project id=1 workflow name=pg_test session_time=2016-12-14T00:00:00+00:00
2016-12-14 12:56:15 +0900 [INFO] (0017@+pg_test+caution): echo>: This workflow can work with PostgreSQL running in your local machine. If you already set up PostgreSQL, please create database and user described in the above configuration
This workflow can work with PostgreSQL running in your local machine. If you already set up PostgreSQL, please create database and user described in the above configuration
2016-12-14 12:56:16 +0900 [INFO] (0017@+pg_test+create_src_table): pg>: queries/create_src_table.sql
2016-12-14 12:56:17 +0900 [INFO] (0017@+pg_test+create_src_table): pg>: queries/create_src_table.sql
2016-12-14 12:56:18 +0900 [INFO] (0017@+pg_test+insert_data_to_src_table): pg>: queries/insert_data_to_src_table.sql
2016-12-14 12:56:18 +0900 [INFO] (0017@+pg_test+insert_data_to_src_table): pg>: queries/insert_data_to_src_table.sql
2016-12-14 12:56:19 +0900 [INFO] (0017@+pg_test+summarize_src_table): pg>: queries/summarize_src_table.sql
2016-12-14 12:56:20 +0900 [INFO] (0017@+pg_test+summarize_src_table): pg>: queries/summarize_src_table.sql
Success. Task state is saved at /private/tmp/pg_test/.digdag/status/20161214T000000+0000 directory.
* Use --session <daily | hourly | "yyyy-MM-dd[ HH:mm:ss]"> to not reuse the last session time.
* Use --rerun, --start +NAME, or --goal +NAME argument to rerun skipped
@cosmok Can I ask you to show me the error message just in case? As @hiroyuki-sato said, your password of Redshift is needed to be stored in .config as secrets
. (Thanks @hiroyuki-sato)
BTW, pg
operator supports strict_transaction
mode using SELECT FOR UPDATE
by default which prepends duplicated INSERT INTO
just in case. But Redshift doesn't support SELECT FOR UPDATE
. So I think you need to disable strict_transaction
mode like this
strict_transaction: false
Well, I should've added this option to the document... I'll do that.
FYI: We're developing redshift
and redshift_load
operators now which support Redshift properly.
@hiroyuki-sato adding to the config file worked, Thanks!
But, could you please tell me how the queries in the template file are executed? I have a template file of the following format:
CREATE TEMP TABLE staging_accounting_interval ON COMMIT DROP AS SELECT *..;
BEGIN TRANSACTION;
-- trying to use the temp table
END TRANSACTION;
DROP TABLE staging_accounting_interval;
but, am getting the following error:
ERROR: relation "staging_accounting_interval" does not exist
@komamitsu exact error:
error:
* +usage_calculate+create_temp_interval_table:
Failed to connect to the database
A more specific message like 'password is not valid' would be more helpful.
Thanks for the strict_transaction tip.
I am looking forward to use the redshift loader!
@cosmok
I change begin transaction
and create temp
order
Maybe, You don't need drop table
because it drops table automatically after end transaction
.
BEGIN TRANSACTION;
CREATE TEMP TABLE staging_accounting_interval ON COMMIT DROP AS SELECT *..;
-- trying to use the temp table
END TRANSACTION;
-- DROP TABLE staging_accounting_interval;
@hiroyuki-sato changing the order as suggested by you doesn't make a difference. I am not sure if each query gets a different sesssion?
Humm...
First of all. I checked. create table
it drops table immediately if I don't use begin transaction
.
digdag_test=# BEGIN TRANSACTION;
BEGIN
digdag_test=# CREATE TEMP TABLE staging_accounting_interval ON COMMIT DROP AS SELECT * from summarized_data;
SELECT 5
Check table list, and I found staging_accounting_interval
.
digdag_test=# \dt
List of relations
Schema | Name | Type | Owner
-----------+-----------------------------+-------+--------
pg_temp_3 | staging_accounting_interval | table | hsato
public | __digdag_status | table | digdag
public | example_access_logs | table | digdag
public | summarized_data | table | digdag
(4 rows)
commit tran.
digdag_test=# commit transaction ;
COMMIT
After check table list again, staging_accounting_interval
were dropped.
digdag_test=# \dt
List of relations
Schema | Name | Type | Owner
--------+---------------------+-------+--------
public | __digdag_status | table | digdag
public | example_access_logs | table | digdag
public | summarized_data | table | digdag
(3 rows)
digdag_test=# CREATE TEMP TABLE staging_accounting_interval ON COMMIT DROP AS SELECT * from summarized_data;
SELECT 5
check table. The staging_accounting_interval
table already dropped.
digdag_test=# \dt
List of relations
Schema | Name | Type | Owner
--------+---------------------+-------+--------
public | __digdag_status | table | digdag
public | example_access_logs | table | digdag
public | summarized_data | table | digdag
(3 rows)
And It worked properly on my environment.
timezone: UTC
_export:
pg:
user: digdag
host: 127.0.0.1
database: digdag_test
+caution:
echo>: This workflow can work with PostgreSQL running in your local machine.
If you already set up PostgreSQL, please create database and user described in the above configuration
+query_test:
pg>: queries/test.sql
queris/test.sql
BEGIN TRANSACTION;
CREATE TEMP TABLE staging_accounting_interval ON COMMIT DROP AS SELECT * from summarized_data;
-- trying to use the temp table
END TRANSACTION;
-- DROP TABLE staging_accounting_interval;
digdag run -a pg_test
2016-12-14 14:53:00 +0900: Digdag v0.8.22
2016-12-14 14:53:02 +0900 [WARN] (main): Reusing the last session time 2016-12-14T00:00:00+00:00.
2016-12-14 14:53:02 +0900 [INFO] (main): Using session /private/tmp/pg_test/.digdag/status/20161214T000000+0000.
2016-12-14 14:53:02 +0900 [INFO] (main): Starting a new session project id=1 workflow name=pg_test session_time=2016-12-14T00:00:00+00:00
2016-12-14 14:53:03 +0900 [INFO] (0017@+pg_test+caution): echo>: This workflow can work with PostgreSQL running in your local machine. If you already set up PostgreSQL, please create database and user described in the above configuration
This workflow can work with PostgreSQL running in your local machine. If you already set up PostgreSQL, please create database and user described in the above configuration
2016-12-14 14:53:04 +0900 [INFO] (0017@+pg_test+query_test): pg>: queries/test.sql
2016-12-14 14:53:05 +0900 [INFO] (0017@+pg_test+query_test): pg>: queries/test.sql
Success. Task state is saved at /private/tmp/pg_test/.digdag/status/20161214T000000+0000 directory.
* Use --session <daily | hourly | "yyyy-MM-dd[ HH:mm:ss]"> to not reuse the last session time.
* Use --rerun, --start +NAME, or --goal +NAME argument to rerun skipped tasks.
I commented DROP TABLE staging_accounting_interval;
by the way.
And I tested on PostgreSQL 9.6.1.
If you can't solve this issue, can I ask you to show me the error message and configuration?
@hiroyuki-sato The error hasn't gone away. BTW the same script works when I fed it to psql CLI, so must be how pg operator operates or could be a redshift thing.
Config:
+create_temp_interval_table: pg>: ./queries/create_temp_interval_table.sql
./queries/create_temp_interval_table.sql
BEGIN TRANSACTION; CREATE TEMP TABLE staging_accounting_interval AS SELECT * FROM ...; DELETE FROM accounting_interval USING staging_accounting_interval WHERE ...; INSERT INTO accounting_interval SELECT * FROM staging_accounting_interval; END TRANSACTION;
Error:
2016-12-15` 09:51:37 +1100: Digdag v0.8.22 2016-12-15 09:51:40 +1100 [WARN] (main): Reusing the last session time 2016-12-14T00:00:00+11:00. 2016-12-15 09:51:40 +1100 [INFO] (main): Using session /home/k7/workspace/digdag/billing/.digdag/status/20161214T000000+1100. 2016-12-15 09:51:40 +1100 [INFO] (main): Starting a new session project id=1 workflow name=usage_calculate_load_to_rds session_time=2016-12-14T00:00:00+11:00 2016-12-15 09:51:41 +1100 [INFO] (0017@+usage_calculate_load_to_rds+create_temp_interval_table): pg>: ./queries/create_temp_interval_table.sql 2016-12-15 09:51:42 +1100 [INFO] (0017@+usage_calculate_load_to_rds+create_temp_interval_table): pg>: ./queries/create_temp_interval_table.sql 2016-12-15 09:51:43 +1100 [ERROR] (0017@+usage_calculate_load_to_rds+create_temp_interval_table): Configuration error at task +usage_calculate_load_to_rds+create_temp_interval_table: Given query is invalid (config) > ERROR: relation "staging_accounting_interval" does not exist (psql) 2016-12-15 09:51:43 +1100 [INFO] (0017@+usage_calculate_load_to_rds^failure-alert): type: notify error: * +usage_calculate_load_to_rds+create_temp_interval_table: Given query is invalid
What happens with having only one single statement in the file? Did you try other queries such as SELECT 1
? Did it work? What other things did you try? What things didn't work and what messages you got?
Here is the location where the error is happening: https://github.com/treasure-data/digdag/blob/9ebb1efcc426915adaba33496f4f8dbedfa045dd/digdag-standards/src/main/java/io/digdag/standards/operator/jdbc/AbstractJdbcOperator.java#L115-L118 where getting results from https://github.com/treasure-data/digdag/blob/9ebb1efcc426915adaba33496f4f8dbedfa045dd/digdag-standards/src/main/java/io/digdag/standards/operator/jdbc/AbstractJdbcConnection.java#L58-L75
@frsyuki
The comment of java.sql.Connection#prepareStatement
says,
This method is optimized for handling parametric SQL statements that benefit from precompilation. If the driver supports precompilation, the method prepareStatement will send the statement to the database for precompilation.
PostgreSQL's prepared statement looks a server-side object https://www.postgresql.org/docs/current/static/sql-prepare.html. So in the case of this issue, validation of query fails since staging_accounting_interval
doesn't exists on server-side. https://github.com/treasure-data/digdag/blob/master/digdag-standards/src/main/java/io/digdag/standards/operator/jdbc/AbstractJdbcOperator.java#L115-L118
Also, the preflight validation doesn't seem so useful. Do you think it's okay to remove that?
^ I reproduced this issue with PostgreSQL. And confirmed that removing the preflight validation of SQL resolved the issue.
ah, now it make sense. to summarize: a combination of multiple statements that include a TEMP table inside a pg template sql file will error.
Replacing java.sql.Connection#prepareStatement
with nativeSQL
would be a better solution. That's a local side validation. I'll examine it
Thanks everyone, I am now closing this issue as all issues are resolved.
@cosmok Thanks for the reporting!
@komamitsu that's a very good catch! @cosmok thank you for reporting the fix 👍
I get 'Failed to connect to the database' while running the workflow at the bottom. I can connect to redshift via psql BTW.
timezone: Australia/Sydney _export: pg: host: host port: 5439 database: dbname user: username password: password ssl: false
+create_temp_interval_table: pg>: queries/create_temp_interval_table.sql