issues
search
kubeflow
/
training-operator
Distributed ML Training and Fine-Tuning on Kubernetes
https://www.kubeflow.org/docs/components/training
Apache License 2.0
1.51k
stars
657
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[GSOC] Project 7 Tracking Issue: Automate docs generation for Training-operator Python SDK
#2156
shivas1516
opened
4 days ago
1
Improve Training Operator release process
#2155
andreyvelich
opened
5 days ago
2
Add Changelog for Training Operator v1.8.0-rc.1
#2154
andreyvelich
closed
5 days ago
2
Update release document
#2153
andreyvelich
closed
5 days ago
4
Release Training Operator Image for v1.8.0-rc.1
#2152
andreyvelich
closed
5 days ago
3
Release Training SDK 1.8.0rc1
#2151
andreyvelich
closed
5 days ago
4
Remove support for MXJob
#2150
tariq-hasan
opened
6 days ago
5
Bump scikit-learn from 1.0.1 to 1.5.0 in /examples/xgboost/lightgbm-dist
#2149
dependabot[bot]
closed
1 week ago
5
update volcano to v1.9.0
#2148
lowang-bh
closed
2 weeks ago
2
Cherry pick: [SDK] Sync Transformers version for train API (#2146)
#2147
andreyvelich
closed
2 weeks ago
2
[SDK] Sync Transformers version for train API
#2146
andreyvelich
closed
2 weeks ago
2
Tracking Issue: Integrate JAX in Kubeflow Training Operator
#2145
sandipanpanda
opened
2 weeks ago
0
[SDK] Explain Python version support cycle
#2144
andreyvelich
closed
2 weeks ago
2
TfJob creation failed due to webhook validation failure
#2143
nagar-ajay
opened
2 weeks ago
0
Update Slack Invitation
#2142
andreyvelich
closed
2 weeks ago
2
spatial dataset training functions
#2141
Jo316
opened
3 weeks ago
1
Automated cherry pick of #2109: changed package name to flake8 to fix pip install
#2140
tenzen-y
closed
2 weeks ago
3
Automated cherry pick of #2079: fix volcano podgroup update issue #2130: Refine the integration tests for the immutable PyTorchJob
#2139
tenzen-y
closed
2 weeks ago
3
Automated cherry pick of #2105: Support Python 3.11 and Drop Python 3.7 #2122: Fix Incorrect Events in get_job_logs API
#2138
tenzen-y
closed
2 weeks ago
2
Automated cherry pick of #2105: Support Python 3.11 and Drop Python 3.7 #2109: changed package name to flake8 to fix pip install #2122: Fix Incorrect Events in get_job_logs API
#2137
tenzen-y
closed
3 weeks ago
5
Automated cherry pick of #2079: fix volcano podgroup update issue
#2136
tenzen-y
closed
3 weeks ago
6
Cherry pick for 1.8
#2135
tenzen-y
closed
3 weeks ago
3
Automated cherry pick of #2105: Support Python 3.11 and Drop Python 3.7 #2122: Fix Incorrect Events in get_job_logs API #2079: fix volcano podgroup update issue #2130: Refine the integration tests for the immutable PyTorchJob
#2134
tenzen-y
closed
3 weeks ago
4
Automated cherry pick of #2105, #2109, #2122, 2079, and #2130 to v1.8
#2133
tenzen-y
closed
3 weeks ago
4
Automated cherry pick of #2105, #2109, #2122, 2079, and #2130 to v1.8
#2132
tenzen-y
closed
3 weeks ago
4
Automated Documentation Generation for Python SDKs Proposal [GSOC]
#2131
shivas1516
opened
1 month ago
3
Refine the integration tests for the immutable PyTorchJob queueName
#2130
tenzen-y
closed
1 month ago
3
Add GitHub Issue Template
#2129
andreyvelich
closed
1 month ago
2
Update the images to the latest tag in master branch
#2128
johnugeorge
closed
1 month ago
2
The actual default RestartPolicy of PyTorch is inconsistent with its description in the CRD
#2127
Eslody
opened
1 month ago
0
PyTorchJobClient not found
#2126
thatsdone
closed
1 month ago
3
JAX Integration Enhancement Proposal
#2125
sandipanpanda
opened
1 month ago
7
Worker failed without exit code
#2124
w1uo01
opened
1 month ago
0
Updated Github Action Workflows as per issue #2117
#2123
hkiiita
closed
1 month ago
5
[SDK] Fix Incorrect Events in get_job_logs API
#2122
andreyvelich
closed
1 month ago
2
updated action workflows as per issue #2117 Signed-off-by: Hemant Ku…
#2121
hkiiita
closed
1 month ago
1
updated 3rd party workflows as per #2117
#2120
hkiiita
closed
1 month ago
1
Support ARM64 platform in TensorFlow examples
#2119
akhilsaivenkata
closed
1 month ago
6
mpijob will stuck if LastReconcileTime is updated in 1 second
#2118
shadowdsp
opened
1 month ago
0
Update third party worflows in the gh actions
#2117
tenzen-y
closed
1 month ago
4
ARM64 supported in PyTorch examples
#2116
danielsuh05
closed
1 month ago
4
ARM64 supported in PyTorch examples
#2115
danielsuh05
closed
1 month ago
2
Feat: Support ARM64 platform in XGBoost examples
#2114
tico88612
closed
1 month ago
2
Support ARM64 platform in XGBoost examples
#2113
tenzen-y
closed
1 month ago
2
Support ARM64 platform in TensorFlow examples
#2112
tenzen-y
closed
1 month ago
4
Support ARM64 platform in PyTorch examples
#2111
tenzen-y
closed
1 month ago
4
Vulnerability - CVE-2023-44487
#2110
ChristianZaccaria
opened
1 month ago
3
changed package name to flake8 to fix pytests pip install
#2109
ChristopheBrown
closed
1 month ago
4
docs: Uploading updated diagram
#2108
franciscojavierarceo
opened
1 month ago
2
Automated cherry pick of #2106: Add cherry-pick script
#2107
tenzen-y
opened
1 month ago
3
Next