snowplow / dataflow-runner

Run templatable playbooks of Hadoop/Spark/et al jobs on Amazon EMR
http://snowplowanalytics.com
19 stars 8 forks source link

Remove unsupported field when specifying "gp2" volume type #44

Closed ungn closed 3 years ago

ungn commented 6 years ago

When specifying the "gp2" volume type for EBS configuration and the "iops" field is filled in, the following exception is returned with a 400 status code:

ValidationException: IOPS setting is not supported for volume type.

This exception does not appear to be within dataflow-runner. It is likely from AWS EMR when attempting to run the jobflow.

Upon excluding the "iops" field from the test config and running the tests locally, it appears to default to a zero value:

[...]
VolumeSpecification: {
    Iops: 0,
    SizeInGB: 10,
    VolumeType: "gp2"
},
[...]

So I believe simply removing the field from the config will still return the same error as IOPS will default to zero instead of being excluded completely (it is an optional field)

Update dataflow-runner to exclude "iops" population for "gp2" volume types (or to not default to a zero value if not present in config - I imagine the same would apply to any other optional fields).

alexanderdean commented 6 years ago

Isn't this just user error? Where is the bug in Dataflow Runner?

ungn commented 6 years ago

Bug

This line is always explicitly setting the "Iops" field for the emr.VolumeSpecification struct when this field is not required.

Reasoning

The cluster config is still valid when this field is omitted from the cluster config json because config.VolumeSpecification.Iops defaults to a zero value.

With this line omitted and "Iops" set to zero, I believe running it will return the same error of IOPS setting is not supported for volume type. The field should be omitted from the emr.VolumeSpecification struct on creation.

Testing

What's unclear is the AWS behaviour if a zero value is passed in - this hasn't been tested yet. If a "gp2" EBS volume can be attached when the VolumeSpecification "Iops" field is zero, then nothing needs to be changed in dataflow-runner.

alexanderdean commented 6 years ago

Ah, thanks for clarifying @ungn !