Closed sheevy closed 1 year ago
Thank you for creating this PR! I will review this tomorrow.
@terrytangyuan Can you approve CI?
Looks failed
@sheevy Can you address the error in CI?
Do you have any hints where to look? I couldn't see anything in the logs which would point in the right direction
Do you have any hints where to look? I couldn't see anything in the logs which would point in the right direction
CI says we need to regenerate manifests. Did you run make generate
on your local?
I have not. Thanks for the hint, I will check that tomorrow.
Also, you can reproduce error with make verify-generate
on your local.
Also, can you update the PR title with MPICH support
?
Thanks @tenzen-y. I will check on Monday.
I think I've implemented all the comments and suggestions. I'm happy for it to get re-tested. Please let me know if you have any further feedback.
Hey @terrytangyuan, can you approve another run, fixes for all suggestions are in?
Sure
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: alculquicondor
The full list of commands accepted by this bot can be found here.
The pull request process is described here
This is meant to be a continuation of https://github.com/kubeflow/mpi-operator/pull/478 The original PR was stale, and master moved a lot, so it was easier to just create a new PR. I hope that is ok.
These are meant to be the same changes as in the https://github.com/kubeflow/mpi-operator/pull/478, but rebased on top of current master. The main problem with previous PR was the fact that SlotsPerWorker used enviroment variable to control number of slots, but unfortunately such variable does not exist in case for MPICH. Suggested solution was to add number of slots per worker to hostfile. This PR does not implement this, because it was already done in https://github.com/kubeflow/mpi-operator/pull/523
I hope that's correct understanding.