Open chuwy opened 6 years ago
I think this entails adding more logic to eer which is not the direction we want to take right now.
What's wrong with rdb loader failing because it was given wrong targets?
I think this entails adding more logic to eer which is not the direction we want to take right now.
I agree. Just wanted to raise it, so eventually similar logic (with versions that will be old on that moment) could be implemented in Snowplow CLI.
What's wrong with rdb loader failing because it was given wrong targets?
RDB Loader requires roleArn
and fails to read configuration when anything is invalid. When it fails to read configuration it fails to instantiate interpreter. When it fails to instantiate interpreter it cannot publish logs in right place and users see just cryptic EER exception in stdout:
uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:586:in `run'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:103:in `run'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:41:in `<main>'
org/jruby/RubyKernel.java:979:in `load'
uri:classloader:/META-INF/main.rb:1:in `<main>'
org/jruby/RubyKernel.java:961:in `require'
uri:classloader:/META-INF/main.rb:1:in `(root)'
uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_r
Which means they have to dig into EMR step logs.
This issue will potentially hit any storage config with schema bump anywhere except addition. For example in 0.14.0 we allowed users to have password
in form of:
"password": {
"ec2ParameterStore": {
"parameterName": "snowplow.rdbloader.redshift.password"
}
}
If RDB Loader 0.12/0.13 will encounter this configuration - it will successfully validate it by Iglu, but fail to parse configuration. Though same 2-1-0
with old string-password
will be successfully parsed.
Not trying convince anyone, just describing how broad the problem is.
So, it should have follwing logic:
Version X of Snowplow CLI has matrix of schema versions supported to all known RDB Loaders up to X release date:
Consumer | Supported Schemas |
---|---|
Loader 0.13 | [1-0-?, 1-1-?] |
Loader 0.14 | [1-0-?, 1-1-?, 2-0-?] |
If user is trying to pass:
1-0-0
or 1-0-3
to Loader 0.13 - fine.1-2-0
to Loader 0.13 - warn that it might cause problems (as with above example)2-0-0
to Loader 0.13 - it will cause problems. 2-0-0
cannot be parsed by definition.2-1-0
to Loader 0.15 - it might cause problems. We don't know yet if 0.15 support it or too oldThis actually not a EER/CLI logic. Consumer/producer compatiblity should be a part of some core Iglu library and CLI should just use it.
Thanks for this @chuwy.
This is very minor, but EmrEtlRunner happily passes configs with
iglu:com.snowplowanalytics.snowplow.storage/redshift_config/jsonschema/1-0-0
andiglu:com.snowplowanalytics.snowplow.storage/postgresql_config/jsonschema/1-0-0
to RDB Loader. These configs were used by StorageLoader and don't containroleArn
therefore cannot be used by RDB Loader.I think current EER needs to exit immediately when sees these configs.