Open chuwy opened 7 years ago
I'm not sure it was dataflow-runner issue (quite likely it wasn't), but want to explore it further.
I tried to run simple Spark job with following cluster specification:
{ "schema": "iglu:com.snowplowanalytics.dataflowrunner/ClusterConfig/avro/1-1-0", "data": { "name": "dataflow-runner - snowflake transformer", "logUri": "s3://mylogs/logs/", "region": "us-east-1", "credentials": { "accessKeyId": "env", "secretAccessKey": "env" }, "roles": { "jobflow": "EMR_EC2_DefaultRole", "service": "EMR_DefaultRole" }, "ec2": { "amiVersion": "5.5.0", "keyName": "mykey", "location": { "classic": { "availabilityZone": "us-east-1a" } }, "instances": { "master": { "type": "m1.medium" }, "core": { "type": "m1.medium", "count": 1 }, "task": { "type": "m1.medium", "count": 0, "bid": "0.015" } } }, "tags": [ ], "bootstrapActionConfigs": [ ], "configurations": [ { "classification": "core-site", "properties": { "Io.file.buffer.size": "65536" } }, { "classification": "mapred-site", "properties": { "Mapreduce.user.classpath.first": "true" } } ], "applications": [ "Hadoop", "Spark" ] } }
However, no matter what I submitted - job hung in RUNNING state, not producing any output. To make it work I made two changes:
RUNNING
location
{"vpc": null}
m2.xlarge
Not sure what from above worked and whether it was EMR misconfiguration or something from dataflow-runner, but maybe it's worth for raising.
I'm not sure it was dataflow-runner issue (quite likely it wasn't), but want to explore it further.
I tried to run simple Spark job with following cluster specification:
However, no matter what I submitted - job hung in
RUNNING
state, not producing any output. To make it work I made two changes:location
to{"vpc": null}
m2.xlarge
Not sure what from above worked and whether it was EMR misconfiguration or something from dataflow-runner, but maybe it's worth for raising.