runpod / runpodctl

🧰 | RunPod CLI for pod management
https://www.runpod.io/
GNU General Public License v3.0
257 stars 37 forks source link

Added script for PyTorch issues debugging #137

Open kodxana opened 7 months ago

kodxana commented 7 months ago

Added new command: gpu-test Curently it's including one of my script to test out PyTorch (PyTorch,go file) Saves debug informations to /workspace/gpu_diagnostics.json

What script does: Gather all informations about host including:

Runs test on all attached GPU's to makre sure PyTorch fully utilizes CUDA Logs error messages into .json file for tech support to review.

DireLines commented 7 months ago

What will happen if people try to run this on the client? Is there anything telling them that it is a diagnostic to be run specifically through web terminal on the pod? I think somebody will try to run it locally and be confused by the output if not

kodxana commented 7 months ago

What will happen if people try to run this on the client? Is there anything telling them that it is a diagnostic to be run specifically through web terminal on the pod? I think somebody will try to run it locally and be confused by the output if not

Most of the output in json file would be empty though script will try to run PyTorch test.