google / xpk

xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerators such as TPUs and GPUs on GKE.
Apache License 2.0
69 stars 17 forks source link

Add logic to fail Pathways jobs on user code errors. #152

Closed RoshaniN closed 3 months ago

RoshaniN commented 3 months ago

Fixes / Features

Testing / Documentation

Tested locally with changes

Screenshot 2024-06-05 at 11 49 05 AM

Here roshanin-pw-test3 is correct and roshanin-pw-test4 contains a typo in user code.

RoshaniN commented 3 months ago

Thanks. A "for future" comment, let's revisit all the xpk args that are not supported by pathways like --debug-dump-gcs, --scheduler etc. and make sure we have bugs for all of those that can be supported with some work.

Yes, thanks for noticing this - I have a backlog item to evaluate these args for Pathways here - b/342189448