skypilot-org / skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
https://skypilot.readthedocs.io
Apache License 2.0
6.48k stars 462 forks source link

AWS: Support for EC2 Launch Templates #2700

Open mmcclean-aws opened 11 months ago

mmcclean-aws commented 11 months ago

Does SkyPilot support EC2 Launch Templates ? When launching a multi-node cluster network interfaces need to be configured properly. An easy way to do this is to pass in an EC2 Launch Template config.

Details of this feature are found here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-launch-templates.html

concretevitamin commented 11 months ago

Not yet; definitely something great to have. @dongreenberg has mentioned this could potentially speed up provisioning as well.

Under the hood SkyPilot uses boto SDK: https://github.com/skypilot-org/skypilot/blob/master/sky/provision/aws/instance.py#L139. It should be possible to plump launch template config from a frontend spec (maybe ~/.sky/config.yaml) to here.

mmcclean-aws commented 11 months ago

Yes, ideally it should be a parameter that can be added to the config yaml file

mmcclean-aws commented 11 months ago

Here is an example of how a trn1.32xlarge needs to be setup for multi-instance based training with EFA - https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup-trn1-multi-node-execution.html

github-actions[bot] commented 7 months ago

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] commented 6 months ago

This issue was closed because it has been stalled for 10 days with no activity.