ray-project ray_lightning issues

ray-project / ray_lightning

Pytorch Lightning Distributed Accelerators using Ray

Apache License 2.0

211 stars 34 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Upgrade to PTL 1.7.4 / bolts 0.5.0

#209 krfricke closed 1 year ago
0
Move state dict to cpu before converting to state stream

#208 MarkusSpanring closed 1 year ago
1
Teardown after trainer.fit() takes exceptionally long when using RayStrategy with large models

#207 MarkusSpanring closed 1 year ago
1
Bump pytorch-lightning from 1.6.4 to 1.7.4

#206 dependabot[bot] closed 1 year ago
1
Error when using WandbLogger

#205 KwanWaiChung opened 1 year ago
1
Bump pytorch-lightning from 1.6.4 to 1.7.3

#204 dependabot[bot] closed 1 year ago
1
Cast gpu_id to int

#203 m-lyon closed 1 year ago
1
[Code] best_model_path in ModelCheckpointCallback (rank 0 and driver node)

#202 chongxiaoc opened 1 year ago
0
Bump version for development 0.4.0

#201 JiahaoYao opened 1 year ago
0
Bump pytorch-lightning from 1.6.4 to 1.7.2

#200 dependabot[bot] closed 1 year ago
1
dummy update

#199 JiahaoYao closed 1 year ago
0
Update protobuf requirement from <=3.20.1 to <4.21.6

#198 dependabot[bot] closed 1 year ago
1
Bump pytorch-lightning from 1.6.4 to 1.7.1

#197 dependabot[bot] closed 1 year ago
1
support pytorch lightning 1.7

#196 JiahaoYao opened 1 year ago
10
[Ray lightning 1.6] update the change according to the comment in #163

#195 JiahaoYao closed 1 year ago
0
[Code] add pytorch-lightning compatibility for 1.7.x

#194 JiahaoYao opened 1 year ago
6
Bump pytorch-lightning from 1.6.4 to 1.7.0

#193 dependabot[bot] closed 1 year ago
1
Error in RayStrategy.root_device when using multi GPU node

#192 m-lyon opened 1 year ago
20
adding the version in `__init__`

#191 JiahaoYao opened 1 year ago
0
fix issue #189

#190 JiahaoYao closed 1 year ago
2
AttributeError: 'AcceleratorConnector' object has no attribute 'strategy'

#189 m-lyon closed 1 year ago
6
Fix docs formatting

#188 JiahaoYao closed 1 year ago
1
Update protobuf requirement from <=3.20.1 to <4.21.5

#187 dependabot[bot] closed 1 year ago
1
`ray_lightning` checkpoint dir not saving the checkpoint

#186 JiahaoYao opened 1 year ago
0
Update protobuf requirement from <=3.20.1 to <4.21.4

#185 dependabot[bot] closed 1 year ago
1
Distributed training performance slowdown when resuming from a checkpoint.

#184 subhashbylaiah opened 1 year ago
5
Bump pytorch-lightning from 1.5.9 to 1.6.5

#183 dependabot[bot] closed 1 year ago
1
`ray_horovod` multi pid process in the `run`

#182 JiahaoYao opened 1 year ago
2
`ray_horovod` leaks gpu memory on the `cuda:0`

#181 JiahaoYao opened 1 year ago
2
`ray_ddp` issue of `Leaking Caffe2 thread-pool after fork. (function pthreadpool)`

#180 JiahaoYao opened 1 year ago
0
`ray_ddp` gpu issue

#179 JiahaoYao opened 1 year ago
3
tune test: do we need to count the head node cpu?

#178 JiahaoYao closed 1 year ago
1
`ray_ddp` showing no use of gpu

#177 JiahaoYao closed 1 year ago
2
`ray_ddp` the progressive bar is broken

#176 JiahaoYao opened 2 years ago
1
`ray_ddp` global and local rank

#175 JiahaoYao closed 1 year ago
1
ray ddp fails with 2 gpu workers

#174 JiahaoYao closed 2 years ago
10
`shard-ddp` test of system exit

#173 JiahaoYao opened 2 years ago
0
warning in the ci test (change the deprecated api)

#172 JiahaoYao opened 2 years ago
1
torch remove the checkpoint when `is_global_zero` is not set? (multi-worker setting)

#171 JiahaoYao opened 2 years ago
0
log is changed in the new version of pytorch lightning

#170 JiahaoYao closed 2 years ago
1
change the `checkpoint_callback=True`

#169 JiahaoYao opened 2 years ago
0
warning from the horovod trainer

#168 JiahaoYao closed 2 years ago
3
horovod lightning integration missing the log dir

#167 JiahaoYao opened 2 years ago
0
Update protobuf requirement from <=3.20.1 to <4.21.3

#166 dependabot[bot] closed 1 year ago
1
horovod installation issue

#165 JiahaoYao opened 2 years ago
1
LightningCLI and RayPlugin compatibility

#164 mauvilsa closed 1 year ago
5
Support PyTorch Lightning 1.6

#163 JiahaoYao closed 1 year ago
17
the training results can be pulled to the main process

#162 JiahaoYao closed 2 years ago
0
[raystrategy] multi-stragy in the worker is not consistent

#161 JiahaoYao closed 2 years ago
2
trainer is not consistent during the `ray_ddp`

#160 JiahaoYao closed 2 years ago
1

Previous Next