nchammas / flintrock

A command-line tool for launching Apache Spark clusters.
Apache License 2.0
637 stars 116 forks source link

Flintrock doesn't support more than 200 instances #193

Open douglaz opened 7 years ago

douglaz commented 7 years ago

I don't have the exact error right know (it's while calling the aws filters api). But if you try I bet you won't be able to to do it.

nchammas commented 7 years ago

I've had people launch 200+ node clusters with Flintrock before and we've fixed the initial set of issues that cropped up there (#78, #81). Perhaps this is a regression though.

Sorry about the hanging PRs btw. I will try to get a few of them updated or merged this weekend.

douglaz commented 7 years ago

@nchammas, here is the error:

2017-03-29 17:28:14,038 - flintrock.ec2 - INFO - Requesting 201 spot instances at a max price of $0.03... 2017-03-29 17:28:14,871 - flintrock.ec2 - INFO - 0 of 201 instances granted. Waiting... 2017-03-29 17:28:46,026 - flintrock.ec2 - INFO - All 201 instances granted. An error occurred (FilterLimitExceeded) when calling the DescribeInstances operation: The maximum number of filter values specified on a single call is 200 Traceback (most recent call last): File "/home/allan/mail-ignition/core/tools/flintrock/flintrock/ec2.py", line 755, in _create_instances {'Name': 'instance-id', 'Values': [r['InstanceId'] for r in spot_requests]} File "/usr/local/lib/python3.4/dist-packages/boto3/resources/collection.py", line 83, in iter for page in self.pages():
File "/usr/local/lib/python3.4/dist-packages/boto3/resources/collection.py", line 166, in pages for page in pages: File "/usr/local/lib/python3.4/dist-packages/botocore/paginate.py", line 102, in iter response = self._make_request(current_kwargs) File "/usr/local/lib/python3.4/dist-packages/botocore/paginate.py", line 174, in _make_request return self._method(**current_kwargs) File "/usr/local/lib/python3.4/dist-packages/botocore/client.py", line 251, in _api_call return self._make_api_call(operation_name, kwargs) File "/usr/local/lib/python3.4/dist-packages/botocore/client.py", line 537, in _make_api_call raise ClientError(parsed_response, operation_name) botocore.exceptions.ClientError: An error occurred (FilterLimitExceeded) when calling the DescribeInstances operation: The maximum number of filter values specified on a single call is 200

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/allan/mail-ignition/core/tools/flintrock/flintrock/ec2.py", line 911, in launch user_data=user_data) File "/home/allan/mail-ignition/core/tools/flintrock/flintrock/ec2.py", line 804, in _create_instances {'Name': 'instance-id', 'Values': instance_ids} File "/usr/local/lib/python3.4/dist-packages/boto3/resources/collection.py", line 83, in iter for page in self.pages(): File "/usr/local/lib/python3.4/dist-packages/boto3/resources/collection.py", line 166, in pages for page in pages: File "/usr/local/lib/python3.4/dist-packages/botocore/paginate.py", line 102, in iter response = self._make_request(current_kwargs) File "/usr/local/lib/python3.4/dist-packages/botocore/paginate.py", line 174, in _make_request return self._method(**current_kwargs) File "/usr/local/lib/python3.4/dist-packages/botocore/client.py", line 251, in _api_call return self._make_api_call(operation_name, kwargs) File "/usr/local/lib/python3.4/dist-packages/botocore/client.py", line 537, in _make_api_call raise ClientError(parsed_response, operation_name) botocore.exceptions.ClientError: An error occurred (FilterLimitExceeded) when calling the DescribeInstances operation: The maximum number of filter values specified on a single call is 200

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/allan/mail-ignition/core/tools/flintrock/standalone.py", line 11, in sys.exit(main()) File "/home/allan/mail-ignition/core/tools/flintrock/flintrock/flintrock.py", line 1121, in main cli(obj={}) File "/usr/local/lib/python3.4/dist-packages/click/core.py", line 716, in call return self.main(args, kwargs) File "/usr/local/lib/python3.4/dist-packages/click/core.py", line 696, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.4/dist-packages/click/core.py", line 1060, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.4/dist-packages/click/core.py", line 889, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.4/dist-packages/click/core.py", line 534, in invoke return callback(args, kwargs) File "/usr/local/lib/python3.4/dist-packages/click/decorators.py", line 17, in new_func return f(get_current_context(), *args, *kwargs) File "/home/allan/mail-ignition/core/tools/flintrock/flintrock/flintrock.py", line 395, in launch tags=ec2_tags) File "/home/allan/mail-ignition/core/tools/flintrock/flintrock/ec2.py", line 53, in wrapper res = func(args, kwargs) File "/home/allan/mail-ignition/core/tools/flintrock/flintrock/ec2.py", line 962, in launch cleanup_instances = cluster_instances UnboundLocalError: local variable 'cluster_instances' referenced before assignment

nchammas commented 7 years ago

Looks like we have to change the logic here and in other places where we call filter() to either batch requests so that no request includes more than 200 filter values, or use a different kind of filter that doesn't require enumerating so many values in the first place.