snowplow / emr-etl-runner

Run Snowplow's enrichments on Amazon Elastic MapReduce with minimum fuss
0 stars 5 forks source link

EmrEtlRunner: refuse to pass outdated storage target configurations to RDB Loader #35

Open chuwy opened 6 years ago

chuwy commented 6 years ago

This is very minor, but EmrEtlRunner happily passes configs with iglu:com.snowplowanalytics.snowplow.storage/redshift_config/jsonschema/1-0-0 and iglu:com.snowplowanalytics.snowplow.storage/postgresql_config/jsonschema/1-0-0 to RDB Loader. These configs were used by StorageLoader and don't contain roleArn therefore cannot be used by RDB Loader.

I think current EER needs to exit immediately when sees these configs.

BenFradet commented 6 years ago

I think this entails adding more logic to eer which is not the direction we want to take right now.

What's wrong with rdb loader failing because it was given wrong targets?

chuwy commented 6 years ago

I think this entails adding more logic to eer which is not the direction we want to take right now.

I agree. Just wanted to raise it, so eventually similar logic (with versions that will be old on that moment) could be implemented in Snowplow CLI.

What's wrong with rdb loader failing because it was given wrong targets?

RDB Loader requires roleArn and fails to read configuration when anything is invalid. When it fails to read configuration it fails to instantiate interpreter. When it fails to instantiate interpreter it cannot publish logs in right place and users see just cryptic EER exception in stdout:

    uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/emr_job.rb:586:in `run'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
    uri:classloader:/emr-etl-runner/lib/snowplow-emr-etl-runner/runner.rb:103:in `run'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_reference.rb:43:in `send_to'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/call_with.rb:76:in `call_with'
    uri:classloader:/gems/contracts-0.11.0/lib/contracts/method_handler.rb:138:in `block in redefine_method'
    uri:classloader:/emr-etl-runner/bin/snowplow-emr-etl-runner:41:in `<main>'
    org/jruby/RubyKernel.java:979:in `load'
    uri:classloader:/META-INF/main.rb:1:in `<main>'
    org/jruby/RubyKernel.java:961:in `require'
    uri:classloader:/META-INF/main.rb:1:in `(root)'
    uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_r

Which means they have to dig into EMR step logs.

chuwy commented 6 years ago

This issue will potentially hit any storage config with schema bump anywhere except addition. For example in 0.14.0 we allowed users to have password in form of:

"password": {
    "ec2ParameterStore": {
        "parameterName": "snowplow.rdbloader.redshift.password"
    }
}

If RDB Loader 0.12/0.13 will encounter this configuration - it will successfully validate it by Iglu, but fail to parse configuration. Though same 2-1-0 with old string-password will be successfully parsed.

Not trying convince anyone, just describing how broad the problem is.

So, it should have follwing logic:

Version X of Snowplow CLI has matrix of schema versions supported to all known RDB Loaders up to X release date:

Consumer Supported Schemas
Loader 0.13 [1-0-?, 1-1-?]
Loader 0.14 [1-0-?, 1-1-?, 2-0-?]

If user is trying to pass:

This actually not a EER/CLI logic. Consumer/producer compatiblity should be a part of some core Iglu library and CLI should just use it.

alexanderdean commented 6 years ago

Thanks for this @chuwy.