kislyuk / aegea

Amazon Web Services Operator Interface
Apache License 2.0
68 stars 17 forks source link

error on `aegea batch submit`: command field cannot have empty strings #49

Closed jamestwebber closed 5 years ago

jamestwebber commented 5 years ago

It seems that a recent change has broken our previous workflow somehow. We have scripts that build up a command for aegea batch and then use subprocess to submit the job, like so:

aegea batch submit --queue aegea_batch --vcpus 16 --memory 64000 --ecr-image aligner --storage /mnt=500 --command 'PATH=$HOME/anaconda/bin:$PATH; cd utilities; git pull; git checkout master; python setup.py install; python -m utilities.alignment.run_star_and_htseq --taxon mm10-plus --num_partitions 1 --partition_id 0 --s3_input_path s3://czb-seqbot/fastqs/190906_A00111_0366_AHNKGFDSXX/ --s3_output_path s3://czb_maca/Plate_seq/parabiosis/190906_A00111_0366_AHNKGFSDSXX/mm10/

Which now raises the error:

botocore.errorfactory.ClientException: An error occurred (ClientException) when calling the SubmitJob operation: Error executing request, Exception : Command field cannot have empty strings, RequestId: b1aa9dd3-b9cb-4c7d-9e52-72a043b3519e

This happens with versions v2.6.4 and v2.6.5. I recommended downgrading to a known working version of aegea as a quick fix, and they reported success with v2.3.6. We could try bisecting that space but hopefully you will have a better idea what happened. My guess is that there's some issue with how aegea is building the batch command? Maybe due to the complexity of the command we're submitting?

kislyuk commented 5 years ago

Thanks, looking. Yes, this stems from a change to aegea batch that I introduced in a recent refactor.

kislyuk commented 5 years ago

Fixed in v2.6.6, please test. Sorry about the disruption.

jamestwebber commented 5 years ago

Hm I think a couple other things changed which I will need to debug... when I tested a job I got this error:

Traceback (most recent call last):
File "/usr/local/bin/aegea", line 23, in <module>
aegea.main()
File "/usr/local/lib/python3.5/dist-packages/aegea/__init__.py", line 89, in main
result = parsed_args.entry_point(parsed_args)
File "/usr/local/lib/python3.5/dist-packages/aegea/ebs.py", line 65, in create
return attach(parser_attach.parse_args([res["VolumeId"]], namespace=args))
File "/usr/local/lib/python3.5/dist-packages/aegea/ebs.py", line 139, in attach
logger.info("Formatting %s (%s)", args.volume_id, find_devnode(args.volume_id))
File "/usr/local/lib/python3.5/dist-packages/aegea/ebs.py", line 109, in find_devnode
raise Exception("Could not find devnode for
{}
".format(volume_id))
Exception: Could not find devnode for vol-073d4a4404fbac5d1
Detaching EBS volume
usage: aegea ebs detach [-h] [--max-col-width MAX_COL_WIDTH] [--json]
[--log-level {WARNING,CRITICAL,DEBUG,INFO,ERROR}]
[--unmount] [--delete] [--force] [--dry-run]
volume_id
aegea ebs detach: error: the following arguments are required: volume_id

The command run was

aegea batch submit --queue aegea_batch --vcpus 16 --memory 64000 --ecr-image aligner --storage /mnt=500 --command 'PATH=$HOME/anaconda/bin:$PATH; cd utilities; git pull; git checkout master; python setup.py install; python -m utilities.alignment.run_star_and_htseq --taxon mm10-plus --num_partitions 100 --partition_id 0 --s3_input_path s3://czb-seqbot/fastqs/190906_A00111_0366_AHNKGFDSXX/ --s3_output_path s3://czb-maca/Plate_seq/parabiosis/190906_A00111_0366_AHNKGFDSXX/mm10/'

Not sure if you have access to our logs but it's job 2281ef07-34d9-4143-8913-45da0152f124

kislyuk commented 5 years ago

That is a different issue, discussed in #47. The short-term fix is to use m5/r5/c5 family instances. I'm working on a longer term solution.

kislyuk commented 5 years ago

The issue discussed in #47 should be fixed in v2.6.8 (on all instance types), please test.