Closed skostrewa closed 3 weeks ago
I can't reproduce on my mac. Where is it getting raw_bc
from? the env.yaml
?
It's kinda strange if it's picking up from the env.yaml
at all, since it needs to be explicitly defined (and you are not). You'd need to do custom_schema_name : ${custom_schema_name}
to pick up from variables in env.yaml
. See https://docs.slingdata.io/sling-cli/run/configuration/variables#replication, this part:
env:
# ${path_prefix} pulls from environment variables in sling process or env
path_prefix: '${path_prefix}' # From env.yaml (not in Environment)
start_date: '${START_DATE}' # From Environment
end_date: '${END_DATE}' # From Environment
See https://github.com/slingdata-io/sling-cli/blob/main/core/env/envfile.go#L146-L157
the variable
keys from env.yaml
are simply loaded into the process' environment variables, so it's available at runtime, as an environment variable. It shouldn't be automatically injected into the replication "memory space" if that makes sense. You should have to explicitly define it.
Yeah, i forgot to mention that raw_bc
is indeed defined in the env.yml
.
Im not sure if i understand you correctly:
It's kinda strange if it's picking up from the env.yaml at all, since it needs to be explicitly defined (and you are not).
you mean the schema raw_bc
? this is set at the connection-level in my env.yml
connections:
...
DWH:
type: postgres
host: localhost
user: ...
password: ...
port: 5432
database: dwh
sslmode: disable
schema: raw_bc
...
replication.yml
(db to db - everything working as expected):
---
source: BusinessCentral
target: DWH
defaults:
mode: full-refresh
object: '{target_schema}.{stream_table}' # changing '{target_schema}' will affect the schema being created -> 'abc' means all streams are inserted into schema 'abc'
target_options:
pre_sql: 'DROP TABLE IF EXISTS "{target_schema}"."{target_table}" CASCADE;'
streams:
Company:
dbo."{Mandant}$G_L Account${GUID}":
object: '"G_L Account"'
...
replication.yml
(xlsx to db - everything works as expected, too):
source: LOCAL
target: DWH
# note the variable being declared at the top and used in defaults.object
# again changing this string here or set a string under defaults.object will affect the resulting schema
env:
custom_schema_name: raw_xls
defaults:
mode: full-refresh
# again one could change '{custom_schema_name}' here and the streams would be affected
object: "{custom_schema_name}.{stream_file_name}"
source_options:
format: xlsx
target_options:
pre_sql: 'DROP TABLE IF EXISTS "{custom_schema_name}"."{stream_file_name}" CASCADE;'
streams:
"file://D:/Data/Excel/stammdaten/Kontenplan.xlsx":
source_options:
sheet: "Kontenplan Umsetzung!I2:J"
columns:
'*': string
replication.yml
(csv to db - does not work as expected):
source: LOCAL
target: DWH
env:
custom_schema_name: raw_mig_board
defaults:
mode: full-refresh
# even though very similar the schema will always be 'raw_bc', not what is defined at the top or put here
object: "{custom_schema_name}.{target_table}"
source_options:
format: csv
target_options:
pre_sql: 'DROP TABLE IF EXISTS "{custom_schema_name}"."{target_table}" CASCADE;'
streams:
"file://D:/Data/Board/Cubes/V0050001.csv":
object: cube_kostenrechnung_ist
columns:
monat: string(10)
You'd need to do
custom_schema_name : ${custom_schema_name}
to pick up from variables inenv.yaml
.
i don't want to pass a variable from my env.yml
to the replication.yml
. I want to define a variable in my replication.yml
and use it. i want to mimic the behaviour of the seconds replication (setting the schema to be precise) to the third.
Im confused how sling behaves so differently when there are virtually no changes. i hope this message is helpful and you can shed some light.
Ah ok, it's the connection schema
, I was off-track then. Ok, probably some bug.
Oh you know what, your object is cube_kostenrechnung_ist
doesn't have a schema. That's why, sling fills that in.
Did you mean to override that default object "{custom_schema_name}.{target_table}"
?
Maybe not a bug then, haha
I, of course, did not mean to override the default object. Thank you very much for your quick support!
Issue Description
Description of the issue: I have two replications. One for xlsx-files and one for csv-files. My replication for .xlsx-files works fine. When adapting this replication to .csv-files, i noticed that renaming the schema does not work. It seems that when working with csv-files defaults.object is not being evaluated.
Sling version (
sling --version
): 1.2.23.dev (2024-10-28)Operating System (
linux
,mac
,windows
): Windows 11XLSX Replication Configuration:
-d
): sling executes the pre-sql statement as expected and continues to work with the custom schema in this case.2024-10-30 10:28:42 INF Sling Replication Completed in 2s | LOCAL -> DWH | 1 Successes | 0 Failures
-d
): dispite only minor differences in the replication.yml, after the pre-sql is being executed with the correct values (custom-schema which is passed as a variable), the schema from the env.yml being used - even though when ingesting xlsx files i was able to change that schema via the variable defined in the replication.yml. Regardless of what i do, when ingesting csv-files sling takes the schema from the env.yml.2024-10-30 10:28:42 INF Sling Replication Completed in 2s | LOCAL -> DWH | 1 Successes | 0 Failures
PS D:\linkFISH\Projects\sling\base_ingestion> sling run -r .\migration_cubes.yml -d 2024-10-30 10:29:53 DBG opened "file" connection (conn-file-w7n) 2024-10-30 10:29:53 INF Sling Replication | LOCAL -> DWH | file://D:/Data/Board/Cubes/V0050001.csv 2024-10-30 10:29:53 DBG Sling version: 1.2.23.dev (2024-10-28) (windows amd64) 2024-10-30 10:29:53 DBG type is file-db 2024-10-30 10:29:53 DBG using: {"columns":null,"mode":"full-refresh","transforms":null} 2024-10-30 10:29:53 DBG using source options: {"trim_space":false,"empty_as_null":true,"header":true,"fields_per_rec":-1,"compression":"auto","format":"csv","null_if":"NULL","datetime_form at":"AUTO","skip_blank_lines":false,"max_decimals":-1} 2024-10-30 10:29:53 DBG using target options: {"datetime_format":"auto","file_max_rows":0,"max_decimals":-1,"use_bulk":true,"add_new_columns":true,"adjust_column_type":false,"column_casing ":"source","pre_sql":"DROP TABLE IF EXISTS \"raw_mig_board\".\"cube_kostenrechnung_ist\" CASCADE;"} 2024-10-30 10:29:53 DBG opened "postgres" connection (conn-postgres-feg) 2024-10-30 10:29:53 INF connecting to target database (postgres) 2024-10-30 10:29:53 INF reading from source file system (file) 2024-10-30 10:29:53 DBG opened "file" connection (conn-file-IZQ) 2024-10-30 10:29:53 DBG reading single datastream from file://D:/Data/Board/Cubes/V0050001.csv [format=csv] 2024-10-30 10:29:53 DBG merging csv readers of 1 files [concurrency=3] from file://D:/Data/Board/Cubes/V0050001.csv 2024-10-30 10:29:53 DBG processing reader from file://D:/Data/Board/Cubes/V0050001.csv 2024-10-30 10:29:53 DBG delimiter auto-detected: "\t" 2024-10-30 10:29:54 INF writing to target database [mode: full-refresh] 2024-10-30 10:29:54 INF executing pre-sql 2024-10-30 10:29:54 DBG DROP TABLE IF EXISTS "raw_mig_board"."cube_kostenrechnung_ist" CASCADE 2024-10-30 10:29:54 DBG drop table if exists "raw_bc"."cube_kostenrechnung_ist" 2024-10-30 10:29:54 DBG table "raw_bc"."cube_kostenrechnung_ist" dropped 2024-10-30 10:29:54 DBG create table if not exists "raw_bc"."cube_kostenrechnung_ist" ("kostenstelle_ukc" text, "konto" text, "buchungsart" text, "monat" bigint, "betragkostenrechnung" numeric, "_sling_loaded_at" timestamptz) 2024-10-30 10:29:54 INF created table "raw_bc"."cube_kostenrechnung_ist" 2024-10-30 10:29:54 INF streaming data (direct insert) 2s 237,716 125876 r/s 15 MB | 37% MEM | 73% CPU 2024-10-30 10:29:57 DBG select count(*) cnt from "raw_bc"."cube_kostenrechnung_ist" 3s 364,193 204984 r/s 24 MB | 37% MEM | 55% CPU 2024-10-30 10:29:57 INF inserted 446649 rows into raw_bc."cube_kostenrechnung_ist" in 3 secs [118,059 r/s] 2024-10-30 10:29:57 INF execution succeeded
2024-10-30 10:29:58 INF Sling Replication Completed in 4s | LOCAL -> DWH | 1 Successes | 0 Failures