Open danielvegamyhre opened 7 months ago
/assign
@uroy-personal are you still working on these? If not I am going to unassign them so someone else can work on them.
Hi @danielvegamyhre,
Yes I am on it. I need to add the example here right?
https://github.com/kubernetes-sigs/jobset/blob/main/docs/concepts/README.md
Also please help me on what content ( example yaml ) to put there. I hope to finish all the open tasks ( assigned to me ) by this week-end.
Hi @danielvegamyhre, Yes I am on it. I need to add the example here right? https://github.com/kubernetes-sigs/jobset/blob/main/docs/concepts/README.md
Also please help me on what content ( example yaml ) to put there. I hope to finish all the open tasks ( assigned to me ) by this week-end.
Yes, you can reference some examples in the examples/
directory to help you get started.
Also note it would be nice in the provisioning step to show example commands for all 3 major cloud providers (AWS, GCP, Azure)
Also note it would be nice in the provisioning step to show example commands for all 3 major cloud providers (AWS, GCP, Azure)
Thanks. I am working on it. Hope to raise the PR in the next few days.
@uroy-personal Just following up, are you still working on this?
Yes @danielvegamyhre , I am on it. I made the changes but found that the above README page removed. Will complete it within this week for sure! Thanks
Good Morning @danielvegamyhre , Started the ball rolling here. So far I have added the examples present in examples/ into the site concepts page. Where to get the example commands for the cloud providers ( GCP, AWS & Azure ) ? Please help. I will modify the PR again.
It seems this issue needs GPU access. Is there a way to get GPU access @danielvegamyhre ?
/unassign
@uroy-personal To make this easier, let's not include the steps to provision GPU nodes on each Cloud Provider. Instead, let's just use a generic/placeholder nodeSelector (e.g. your.cloud.provider.com/gpu-type
) to indicate to the user this should be replaced.
Thanks @danielvegamyhre , I will have a look and get back at the earliest!
/assign Currently I have a gpu environment, but the gpu card is not up to date, but I can maybe try it and see.
/assign
What would you like to be added: A comprehensive example showing how to run a training workload on GPUs using JobSet. We could have one example per major cloud provider.
Why is this needed: We need more concrete examples to reduce friction of user onboarding. Right now we mostly have toy examples with sleep containers to demonstrate functionality of different features.