spring-cloud / spring-cloud-dataflow

A microservices-based Streaming and Batch data processing in Cloud Foundry and Kubernetes
https://dataflow.spring.io
Apache License 2.0
1.11k stars 581 forks source link

Batch and Task Prefix Env Variable cannot be overriden #5828

Closed jnt0r closed 2 months ago

jnt0r commented 4 months ago

Description: I have some boot3 Tasks that I want to start in my spring cloud dataflow instance. My dataflow instance is using a different database schema than the default one of the used database user. That is working correctly. When I launch my tasks I need to give them the other databse schema name. As far as I found I should override the spring.cloud.task.tableprefix and spring.batch.jdbc.table-prefix properties to have my schema name followed by BOOT3BATCH or BOOT3TASK. But when overriding the environment variables in the "launch task" tab, they get overriden by dataflow and have the default boot3 value without my schema prefix. I therefore need to set the env variable SPRING_APPLICATION_JSON with my desired configuration.

Is this a desired behaviour of dataflow?

Is there any other possible way to automatically distribute the used database schema of dataflow to every launched task?

Release versions: 2.11.3

corneil commented 4 months ago

@jnt0r Unfortunately we do not provide for overriding the prefixes with the first iteration of supporting both Spring Boot 2.x and Spring Boot 3.x workloads.

If you have the tables in a non default schema then you can configuration the connection or user. If you used a custom prefix as well then you can use views or Oracle synonyms.

MySQL and MariaDB doesn't support schemas. Only databases which can be in connection path.

Spring Boot Properties

For both the startup of Spring Cloud Data Flow Server, Skipper Server and any Task / Batch application add the property: PostgreSQL: spring.datasource.hikari.connection-init-sql="SET search_path TO myschema" DB2 and Oracle: spring.datasource.hikari.connection-init-sql="SET SCHEMA OURSCHEMA"

MS SQL Server only support changing the default schema for the user permanantly.

JDBC Connection Properties

The first suggestion is to change the database user used by dataflow and skipper so that the default schema is the same as your schema. With both PostreSQL and DB2 you add the following to the JDBC connection properties:

If first connection property: ?currentSchema=OUR_SCHEMA_NAME

If there are already connection properties: &currentSchema=OUR_SCHEMA_NAME

Permanent Database configuration

With MS SQL Server:

ALTER USER <dataflow-user> WITH DEFAULT_SCHEMA = OUR_SCHEMA_NAME;

In Oracle you can create a trigger:

CREATE OR REPLACE TRIGGER LOGON_TRG 
  AFTER LOGON ON SCHEMA
BEGIN
     EXECUTE IMMEDIATE 'ALTER SESSION SET CURRENT_SCHEMA = OUR_SCHEMA_NAME';
EXCEPTION 
  when others 
    then null; -- prevent a login failure due to an exception
END;

Views to existing tables

Oracle provides for synonyms as follows:

CREATE SYNONYM TASK_EXECUTION FOR OURSCHEMA.TASK_EXECUTION;

The best you can do in this case is to replace the default tables with views to your schema tables:

Spring Boot 2.x tables for Task and Batch

DROP TABLE BATCH_JOB_EXECUTION_CONTEXT;
DROP TABLE BATCH_STEP_EXECUTION_CONTEXT;
DROP TABLE BATCH_STEP_EXECUTION;
DROP TABLE BATCH_JOB_EXECUTION_PARAMS;
DROP TABLE BATCH_JOB_EXECUTION;
DROP TABLE BATCH_JOB_INSTANCE;
DROP TABLE TASK_TASK_BATCH;
DROP TABLE TASK_EXECUTION_METADATA;
DROP TABLE TASK_EXECUTION_PARAMS;
DROP TABLE TASK_EXECUTION;

CREATE VIEW TASK_EXECUTION AS SELECT * FROM OURSCHEMA.TASK_EXECUTION;
CREATE VIEW TASK_EXECUTION_PARAMS AS SELECT * FROM OURSCHEMA.TASK_EXECUTION_PARAMS;
CREATE VIEW TASK_EXECUTION_METADATA AS SELECT * FROM OURSCHEMA.TASK_EXECUTION_METADATA;
CREATE VIEW TASK_TASK_BATCH AS SELECT * FROM OURSCHEMA.TASK_TASK_BATCH;                         
-- For PostgreSQL
ALTER SEQUENCE TASK_SEQ RESTART WITH SELECT currval('OURSCHEMA.TASK_SEQ'); 
-- For MariaDB
ALTER SEQUENCE TASK_SEQ RESTART WITH SELECT PREVIOUS VALUE FOR OURSCHEMA.TASK_SEQ;

CREATE VIEW TASK_LOCK AS SELECT * FROM OURSCHEMA.TASK_LOCK
CREATE VIEW BATCH_JOB_INSTANCE AS SELECT * FROM OURSCHEMA.BATCH_JOB_INSTANCE;
CREATE VIEW BATCH_JOB_EXECUTION AS SELECT * FROM OURSCHEMA.BATCH_JOB_EXECUTION;
CREATE VIEW BATCH_JOB_EXECUTION_PARAMS AS SELECT * FROM OURSCHEMA.BATCH_JOB_EXECUTION_PARAMS;
CREATE VIEW BATCH_STEP_EXECUTION AS SELECT * FROM OURSCHEMA.BATCH_STEP_EXECUTION;
CREATE VIEW BATCH_STEP_EXECUTION_CONTEXT AS SELECT * FROM OURSCHEMA.BATCH_STEP_EXECUTION_CONTEXT;
CREATE VIEW BATCH_JOB_EXECUTION_CONTEXT AS SELECT * FROM OURSCHEMA.BATCH_JOB_EXECUTION_CONTEXT;

-- Apply same pattern as for TASK_SEQ  
CREATE SEQUENCE BATCH_STEP_EXECUTION_SEQ;
CREATE SEQUENCE BATCH_JOB_EXECUTION_SEQ;
CREATE SEQUENCE BATCH_JOB_SEQ

Spring Boot 3.x tables for Task and Batch

DROP TABLE BOOT3_BATCH_JOB_EXECUTION_CONTEXT;
DROP TABLE BOOT3_BATCH_STEP_EXECUTION_CONTEXT;
DROP TABLE BOOT3_BATCH_STEP_EXECUTION;
DROP TABLE BOOT3_BATCH_JOB_EXECUTION_PARAMS;
DROP TABLE BOOT3_BATCH_JOB_EXECUTION;
DROP TABLE BOOT3_BATCH_JOB_INSTANCE;
DROP TABLE BOOT3_TASK_TASK_BATCH;
DROP TABLE BOOT3_TASK_EXECUTION_METADATA;
DROP TABLE BOOT3_TASK_EXECUTION_PARAMS;
DROP TABLE BOOT3_TASK_EXECUTION;

CREATE VIEW BOOT3_TASK_EXECUTION AS SELECT * FROM OURSCHEMA.BOOT3_TASK_EXECUTION;
CREATE VIEW BOOT3_TASK_EXECUTION_PARAMS AS SELECT * FROM OURSCHEMA.BOOT3_TASK_EXECUTION_PARAMS;
CREATE VIEW BOOT3_TASK_EXECUTION_METADATA AS SELECT * FROM OURSCHEMA.BOOT3_TASK_EXECUTION_METADATA;
CREATE VIEW BOOT3_TASK_TASK_BATCH AS SELECT * FROM OURSCHEMA.BOOT3_TASK_TASK_BATCH;                         
-- For PostgreSQL
ALTER SEQUENCE BOOT3_TASK_SEQ RESTART WITH SELECT currval('OURSCHEMA.BOOT3_TASK_SEQ'); 
-- For MariaDB
ALTER SEQUENCE BOOT3_TASK_SEQ RESTART WITH SELECT PREVIOUS VALUE FOR OURSCHEMA.BOOT3_TASK_SEQ;

CREATE VIEW BOOT3_TASK_LOCK AS SELECT * FROM OURSCHEMA.BOOT3_TASK_LOCK
CREATE VIEW BOOT3_BATCH_JOB_INSTANCE AS SELECT * FROM OURSCHEMA.BOOT3_BATCH_JOB_INSTANCE;
CREATE VIEW BOOT3_BATCH_JOB_EXECUTION AS SELECT * FROM OURSCHEMA.BOOT3_BATCH_JOB_EXECUTION;
CREATE VIEW BOOT3_BATCH_JOB_EXECUTION_PARAMS AS SELECT * FROM OURSCHEMA.BOOT3_BATCH_JOB_EXECUTION_PARAMS;
CREATE VIEW BOOT3_BATCH_STEP_EXECUTION AS SELECT * FROM OURSCHEMA.BOOT3_BATCH_STEP_EXECUTION;
CREATE VIEW BOOT3_BATCH_STEP_EXECUTION_CONTEXT AS SELECT * FROM OURSCHEMA.BOOT3_BATCH_STEP_EXECUTION_CONTEXT;
CREATE VIEW BOOT3_BATCH_JOB_EXECUTION_CONTEXT AS SELECT * FROM OURSCHEMA.BOOT3_BATCH_JOB_EXECUTION_CONTEXT;

-- Apply same pattern to initialize as for BOOT3_TASK_SEQ  
CREATE SEQUENCE BOOT3_BATCH_STEP_EXECUTION_SEQ;
CREATE SEQUENCE BOOT3_BATCH_JOB_EXECUTION_SEQ;
CREATE SEQUENCE BOOT3_BATCH_JOB_SEQ
cppwfs commented 4 months ago

Are you wanting your task apps to access another database when doing their work, but still need to record the task/batch data to dataflow so that it can report state of the app execution?

cppwfs commented 2 months ago

Closing due to inactivity. If closed in error please let us know.