peterknife / boto

Automatically exported from code.google.com/p/boto
0 stars 0 forks source link

emr.describe_jobflow produces invalid results when a BootstrapAction is present. #460

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Start a new EMR workflow containing bootstrap actions.
    emr = EmrConnection()
    EMR_BOOTSTRAP_ACTIONS = BootstrapAction('Add 10G Swap', 's3://elasticmapreduce/bootstrap-actions/add-swap', '10000')
    emr.run_jobflow( name = EMR_JOBFLOW_NAME,
                 log_uri = EMR_LOG_URI,
                 ec2_keyname=EC2_KEYNAME,
                 availability_zone = EMR_AVAILABILITY_ZONE,
                 master_instance_type = EMR_MASTER_TYPE,
                 slave_instance_type = EMR_SLAVE_TYPE,
                 num_instances = EMR_NUM_SLAVES,
                 action_on_failure = EMR_ACTION_ON_FAILURE,
                 keep_alive = EMR_KEEP_ALIVE,
                 enable_debugging = EMR_ENABLE_DEBUGGING,
                 hadoop_version = EMR_HADOOP_VERSION,
                 steps=[],
                 bootstrap_actions=EMR_BOOTSTRAP_ACTIONS)

2. Schedule some jobs
3. Call describe_jobflow
    jf = emr.describe_jobflow(jobflowid)
4. print jf.steps

What is the expected output? What do you see instead?
jobflow.steps should be a list.  It is 'None'

What version of the product are you using? On what operating system?
Reproduces in both HEAD and boto-2.0b2

Please provide any additional information below.

I forked boto on github and my fix is available here: 
    http://github.com/rodo2/boto

The XML parser seems to get confused by the 'BootstrapActions' tag.  I believe 
this is due to finding a 'member' tag inside and thinking it's in 'Steps'.

My fix works around the issue by parsing the elements inside the 
BootstrapAction tag (which is nice to have anyway).  I haven't really worked 
with SAX parsers before, so there may be a better fix...

Original issue reported on code.google.com by rodo%rod...@gtempaccount.com on 1 Oct 2010 at 7:51

GoogleCodeExporter commented 9 years ago
Thanks for the patch.  Could you create a pull request on github?  That makes 
it a little easier to track.  Thanks.

Original comment by Mitch.Ga...@gmail.com on 1 Oct 2010 at 8:19

GoogleCodeExporter commented 9 years ago
Pull request should be there now.

Original comment by rodo%rod...@gtempaccount.com on 1 Oct 2010 at 8:37

GoogleCodeExporter commented 9 years ago
I wonder if it would be possible to see a sample of the XML response from EMR.  
The AWS docs don't really seem to have good sample requests/responses and I'm 
having a little trouble getting my head around what's coming back from AWS.  
You should be able to do something like this:

>>> import boto
>>> boto.set_stream_logger('emr')
>>> emr = boto.connect_emr(debug=2)

and then go ahead and make your call.  The requests and response should print 
out to your terminal.

Original comment by Mitch.Ga...@gmail.com on 1 Oct 2010 at 12:00

GoogleCodeExporter commented 9 years ago
I've attached responses from two EMR clusters.
The first was started with no bootstrap options, the second with the 'add-swap' 
option shown above.

The interesting piece looks something like this:
<BootstrapActions>
  <member>
    <BootstrapActionConfig>
      <ScriptBootstrapAction>
        <Args>
          <member>10000</member>
        </Args>
        <Path>s3://elasticmapreduce/bootstrap-actions/add-swap</Path>
      </ScriptBootstrapAction>
      <Name>Add 10G Swap</Name>
    </BootstrapActionConfig>
  </member>
</BootstrapActions>

Original comment by rodo%rod...@gtempaccount.com on 1 Oct 2010 at 1:09

Attachments: