This moves over to using the standard pytorch CI job template. (doc).
The general advantages should be that we can more easily add features or options in a maintained way. A specific reason is becuase I was not able to ssh-debug on our old CI and @seemethere mentioned that the 'generic workflow' is where the CI SSH support lives.
SSH
Use ssh just like pytorch/pytorch CI:
Artifacts Uploading
The job.dump_folder for each test is uniquely named and bundled into an outputs.zip which can be downloaded from github actions UI:
profiles
checkpoints
flight recorder comm_dumps (if any hang happened during CI)
To implement the artifacts upload, the following changes are made to test_runner.py
job.dump_folder is set to a unique subfolder for each test, so test outputs don't overwrite each other
the root for job.dump_folder is artifacts-to-be-uploaded
the 'default' configuration gets tested explicitly with its own dump_dir specified
Stack from ghstack (oldest at bottom):
318
This moves over to using the standard pytorch CI job template. (doc).
The general advantages should be that we can more easily add features or options in a maintained way. A specific reason is becuase I was not able to ssh-debug on our old CI and @seemethere mentioned that the 'generic workflow' is where the CI SSH support lives.
SSH Use ssh just like pytorch/pytorch CI:
Artifacts Uploading The job.dump_folder for each test is uniquely named and bundled into an
outputs.zip
which can be downloaded from github actions UI:To implement the artifacts upload, the following changes are made to test_runner.py
artifacts-to-be-uploaded