issues
search
guidebooks
/
store
The home for importable Guidebooks
1
stars
10
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
fix: add 'app.kubernetes.io/managed-by: codeflare' label to custodian
#741
starpit
closed
1 year ago
0
feat: improve custodian support for torchx, use smaller base image
#740
starpit
closed
1 year ago
0
fix: logs custodian should pull from kubectl logs, not ray job logs
#739
starpit
closed
1 year ago
0
fix: logs custodian has errors with tee'ing to file
#738
starpit
closed
1 year ago
0
feat: rename self-destruct to logs; and increase ttl timeout on its job
#737
starpit
closed
1 year ago
0
fix: final Succeeded message not shown in ray jobs
#736
starpit
closed
1 year ago
0
fix: further improvements to ray log streaming
#735
starpit
closed
1 year ago
0
fix: ray logs not smooth
#734
starpit
closed
1 year ago
0
feat: avoid websocat in ml/ray/run/logs
#733
starpit
closed
1 year ago
0
fix: websocat ray log streaming can be simplified
#732
starpit
closed
1 year ago
0
fix: decrease epochs from 5 to 2 for getting started ray example
#731
starpit
closed
1 year ago
0
fix: ray labels were using /name should use /instance
#730
starpit
closed
1 year ago
0
fix: vmstat data lacks pod/ prefix on pod name
#729
starpit
closed
1 year ago
0
fix: ray jobs emit job env.json only after job is running
#728
starpit
closed
1 year ago
0
fix: improve messaging of torchx wait-till-running
#727
starpit
closed
1 year ago
0
fix: pod-memory stream lacked pod/ prefix for hostname
#726
starpit
closed
1 year ago
0
fix: torchx wait-till-running was not waiting till *all* workers were running
#725
starpit
closed
1 year ago
0
fix: torchx env isn't written out till the job is already running
#724
starpit
closed
1 year ago
0
fix: capture job env vars for torchx runs
#723
starpit
closed
1 year ago
0
fix: torchx captured logs may not include Succeeded/Failed events
#722
starpit
closed
1 year ago
0
fix: syntax error in code block in torchx status poller
#721
starpit
closed
1 year ago
0
fix: torchx exit handlers were not right
#720
starpit
closed
1 year ago
0
fix: small refinements to torchx logs
#719
starpit
closed
1 year ago
0
fix: remove leftover 'set -x' from debugging
#718
starpit
closed
1 year ago
0
fix: torchx job status file needs to use tee -a to append
#717
starpit
closed
1 year ago
0
fix: improved event handling for torchx exit
#716
starpit
closed
1 year ago
0
fix: improve torchx status events to show Job status
#715
starpit
closed
1 year ago
0
fix: torchx jobs lacked kube event stream
#714
starpit
closed
1 year ago
0
fix: torchx script logic fails if python prefix is not python3
#713
starpit
closed
1 year ago
0
fix: clean up content and coloring of helm install output
#712
starpit
closed
1 year ago
0
fix: torchx cli install fails on zsh
#711
starpit
closed
1 year ago
0
fix: sed RE error can occur in torchx log streamer
#710
starpit
closed
1 year ago
0
fix: pass through guidebook env vars to torchx
#709
starpit
closed
1 year ago
0
fix: ml/torchx/run may fail for users with long user names
#708
starpit
closed
1 year ago
0
fix: torchx log streamer would fail if lines contained control chars
#707
starpit
closed
1 year ago
0
fix: update to official torchx 0.5.0 release
#706
starpit
closed
1 year ago
0
fix: don't fail if we can't hack uid-range
#705
starpit
closed
1 year ago
0
fix: in CI, don't try to use ssh git cloning for workdir
#704
starpit
closed
1 year ago
0
feat: add support for workdir being a github https:// url
#703
starpit
closed
1 year ago
0
fix: ml/torchx/run fails if main python file is not 'main.py'
#702
starpit
closed
1 year ago
0
fix: another fix for relative workdir
#701
starpit
closed
1 year ago
0
fix: further improvements to helm install with relative workdir
#700
starpit
closed
1 year ago
0
fix: improved support for installing and running torchx on 3.9.6 on m…
#699
starpit
closed
1 year ago
0
fix: force vmstat timestamps to use UTC timezone
#698
starpit
closed
1 year ago
0
fix: capture env.json in log aggregation
#697
starpit
closed
1 year ago
0
fix: another fix to improve syntactic conformance of gpu utilization …
#696
starpit
closed
1 year ago
0
fix: gpu stream displays temps with % unit
#695
starpit
closed
1 year ago
0
fix: update gpu utilization stream to conform to vmstat and events log structure
#694
starpit
closed
1 year ago
0
fix: kubectl linux-arm64 installs arm32 binary
#693
starpit
closed
1 year ago
0
fix: bump to madwizard@8 to adopt shell.stdin convention
#692
starpit
closed
1 year ago
0
Previous
Next