ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.3k stars 5.83k forks source link

[aws] unspecified boto dependency breaks autoscaler examples #12195

Closed richardliaw closed 4 years ago

richardliaw commented 4 years ago

What is the problem?

In many of our examples, boto3 is pinned to 1.4.8. However, with this version of boto3, Tte autoscaler is throwing an error on describe_instance_types without any hint as to what the user should do next.

I'm not sure why we can't just load a copy of these instance types in the repo, to be honest.

cc @ericl @AmeerHajAli

Ray version and other system information (Python version, TensorFlow version, OS): master, 3.7

(base) ➜  tune git:(fix-kubernetes-dep) ✗ pip list | grep boto
boto                          2.49.0
boto3                         1.4.8
botocore                      1.8.50

Reproduction (REQUIRED)

Please provide a script that can be run to reproduce the issue. The script should have no external library dependencies (i.e., use fake or mock data / environments):

(base) ➜  tune git:(fix-kubernetes-dep) ✗ ray up $CFG -y
Cluster: basic

Checking AWS environment settings
Traceback (most recent call last):
  File "/Users/rliaw/miniconda3/bin/ray", line 8, in <module>
    sys.exit(main())
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/scripts/scripts.py", line 1471, in main
    return cli()
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/scripts/scripts.py", line 860, in up
    use_login_shells=use_login_shells)
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/autoscaler/_private/commands.py", line 213, in create_or_update_cluster
    config = _bootstrap_config(config, no_config_cache=no_config_cache)
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/autoscaler/_private/commands.py", line 274, in _bootstrap_config
    config = provider_cls.fillout_available_node_types_resources(config)
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/autoscaler/_private/aws/node_provider.py", line 487, in fillout_available_node_types_resources
    cluster_config["provider"].get("aws_credentials"))
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/ray/autoscaler/_private/aws/node_provider.py", line 76, in list_ec2_instances
    instance_types = ec2.describe_instance_types()
  File "/Users/rliaw/miniconda3/lib/python3.7/site-packages/botocore/client.py", line 565, in __getattr__
    self.__class__.__name__, item)
AttributeError: 'EC2' object has no attribute 'describe_instance_types'
AmeerHajAli commented 4 years ago

I am not sure if the best solution is to store the table because we have to keep updating it. I think a quick workaround here is to make this a non breaking feature (keep it in try except as it was previously) with logger.exception().

richardliaw commented 4 years ago

yeah, that would be great @AmeerHajAli can you open a PR and cc me?