issues
search
horovod
/
horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
http://horovod.ai
Other
14.06k
stars
2.22k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
How to set timeout status when Missing Rank?
#4050
fuhailin
opened
6 days ago
0
BFloat16 support for NCCL and Tensorflow, Pytorch
#4049
hanzhi713
opened
1 week ago
0
A fatal error has been detected by the Java Runtime Environment
#4048
Parvez-Khan-1
opened
1 week ago
3
support bfloat16
#4047
gl-001
opened
1 week ago
5
Horovod with Spark - Job Not Distributing Across Worker Nodes
#4046
omarmujahidgithub
opened
2 weeks ago
2
adapt for rocm6
#4044
fsx950223
opened
4 weeks ago
0
NVIDIA CUDA TOOLKIT version to run Horovod in Conda Environment
#4043
ppandit95
opened
1 month ago
1
Environment crashes because it seems to be overriding built in modules
#4042
mtrattner
opened
1 month ago
0
Install Horovod in Apple M1 Pro
#4041
saniyahvira
opened
1 month ago
0
Replace tf.train.SessionRunHook by tf.compat.v1.train.SessionRunHook ?
#4040
whatdhack
opened
1 month ago
0
v0.28.1 Version Mismatch with TF 2.12.0. Works with v0.28.0
#4039
liamaltarac
opened
2 months ago
0
Can horovd process more shards than workers
#4038
dr-graviton
opened
2 months ago
0
Remove MXNet tests from CI
#4037
EnricoMi
opened
3 months ago
2
Pin codeql version
#4036
EnricoMi
closed
3 months ago
1
Move to v6.17.0 queue with CUDA 12.4
#4035
EnricoMi
opened
3 months ago
3
Use docker compose V2 to cache downloaded packages
#4034
EnricoMi
opened
3 months ago
2
Fix MXNet MNIST download url
#4033
EnricoMi
closed
3 months ago
3
Resolve TF saved model not portable issue with tf.keras.optimizers
#4031
supercharleszhu
opened
3 months ago
4
Model parallelisation
#4030
ezhilmathik
opened
3 months ago
0
Enable build with OneCCL 2024 version
#4029
LuFinch
opened
3 months ago
0
Tensorflow Saved model not portable with latest tf.keras.optimizers
#4028
supercharleszhu
opened
3 months ago
0
Early Stopping tf.keras Crashes
#4027
AllardJM
opened
3 months ago
0
pass `*args, **kwargs` to `Optimizer.zero_grad`
#4026
njzjz
opened
3 months ago
2
Horovod + Deepspeed : Device mismatch error
#4023
PurvangL
closed
4 months ago
0
Unexpected Worker Failure when using Elastic Horovod + Process Sets
#4021
Pranavug
opened
4 months ago
0
Horovod with TensorFlow crashed
#4020
mythZhu
opened
4 months ago
0
Unable to run Horovod Pytorch on AMD AMI100 GPUs
#4019
kf-cuanschutz
closed
4 months ago
2
The program blocks hvd.init().
#4018
divmid
opened
5 months ago
1
Can I call horovod training process in proc = subprocess.Popen(command, shell=True, cwd=cwd) using command
#4017
bit-pku-zdf
opened
5 months ago
0
Do not skip entire require_list, only cffi when not build action
#4016
EnricoMi
opened
5 months ago
1
Stop specific worker in Horovod Elastic
#4015
mozizhao
opened
5 months ago
0
Use pytorch from pip installed but get "#error You need C++17 to compile PyTorch" when installing horovod
#4014
pcjiang1998
closed
5 months ago
2
Error install horovod with python 3.11.5 on macOS 11.3.1
#4013
DriverSong
opened
6 months ago
0
Error install Horovod with python-3.11.5 on macos 11.3.1
#4012
DriverSong
closed
6 months ago
1
Move from `docker-compose` to `docker compose`
#4011
EnricoMi
closed
5 months ago
3
[fix]func return no values, and catch SystemExit
#4010
gl-001
closed
5 months ago
1
AttributeError: module 'horovod.torch' has no attribute 'init'
#4009
Cow-Kite
opened
6 months ago
0
ipv6 address family
#4008
NEWPLAN
opened
6 months ago
0
set c++ std to 17
#4006
harisankar95
closed
7 months ago
0
[Volcano] Error using horovod with Vocalno cluster
#4005
SimZhou
closed
7 months ago
5
Prevent excessive parallel jobs during compilation
#4004
njzjz
opened
7 months ago
3
No module named 'packaging' when installing Horovod
#4003
flixxox
opened
7 months ago
9
Bump docker/setup-buildx-action from 2 to 3
#4002
dependabot[bot]
opened
7 months ago
1
Bump buildkite/trigger-pipeline-action from 1.3.1 to 2.0.0
#4001
dependabot[bot]
opened
7 months ago
1
Bump docker/login-action from 2 to 3
#4000
dependabot[bot]
opened
7 months ago
1
Bump docker/build-push-action from 3 to 5
#3999
dependabot[bot]
opened
7 months ago
1
Compile using C++17 when using PyTorch 2.1
#3998
thomas-bouvier
opened
7 months ago
14
Horovod 0.28.1 incompatibility with PyTorch 2.1.0
#3996
rithwik-db
opened
8 months ago
2
Fix DistributedOptimizer bug for tensorflow
#3995
Chenjingliang1
opened
8 months ago
1
tensorflow hvd.DistributedOptimizer bug
#3994
Chenjingliang1
opened
8 months ago
0
Next