ray-project / ray_lightning

Pytorch Lightning Distributed Accelerators using Ray
Apache License 2.0
211 stars 34 forks source link

AttributeError: 'AcceleratorConnector' object has no attribute 'strategy' #189

Closed m-lyon closed 2 years ago

m-lyon commented 2 years ago

Using the toy example code found here,

I get the following error:

(train_mnist pid=36277) Traceback (most recent call last):
(train_mnist pid=36277)   File "/home/matt/anaconda3/envs/torch/lib/python3.8/site-packages/ray/tune/function_runner.py", line 277, in run
(train_mnist pid=36277)     self._entrypoint()
(train_mnist pid=36277)   File "/home/matt/anaconda3/envs/torch/lib/python3.8/site-packages/ray/tune/function_runner.py", line 349, in entrypoint
(train_mnist pid=36277)     return self._trainable_func(
(train_mnist pid=36277)   File "/home/matt/anaconda3/envs/torch/lib/python3.8/site-packages/ray/util/tracing/tracing_helper.py", line 462, in _resume_span
(train_mnist pid=36277)     return method(self, *_args, **_kwargs)
(train_mnist pid=36277)   File "/home/matt/anaconda3/envs/torch/lib/python3.8/site-packages/ray/tune/function_runner.py", line 645, in _trainable_func
(train_mnist pid=36277)     output = fn()
(train_mnist pid=36277)   File "/home/matt/Dev/git/phd-torchscripts/PhD/torchscripts/pcconv/tests/raytune_test.py", line 19, in train_mnist
(train_mnist pid=36277)     trainer = pl.Trainer(
(train_mnist pid=36277)   File "/home/matt/anaconda3/envs/torch/lib/python3.8/site-packages/pytorch_lightning/utilities/argparse.py", line 339, in insert_env_defaults
(train_mnist pid=36277)     return fn(self, **kwargs)
(train_mnist pid=36277)   File "/home/matt/anaconda3/envs/torch/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 486, in __init__
(train_mnist pid=36277)     self._accelerator_connector = AcceleratorConnector(
(train_mnist pid=36277)   File "/home/matt/anaconda3/envs/torch/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 204, in __init__
(train_mnist pid=36277)     self._init_strategy()
(train_mnist pid=36277)   File "/home/matt/anaconda3/envs/torch/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py", line 634, in _init_strategy
(train_mnist pid=36277)     raise RuntimeError(f"{self.strategy} is not valid type: {self.strategy}")
(train_mnist pid=36277) AttributeError: 'AcceleratorConnector' object has no attribute 'strategy'

Not really sure what i'm doing incorrectly, I have the following versions: torch==1.11.0, pytorch-lightning==1.6.0, ray-lightning==0.3.0, and ray==1.13.0

amogkam commented 2 years ago

Hey @m-lyon, can you use pytorch Lightning 1.6.4 instead of 1.6.0?

m-lyon commented 2 years ago

@amogkam I tried with pytorch-lightning==1.6.4 and the same error occurs.

amogkam commented 2 years ago

@JiahaoYao do you think you can take a look? Thanks!

JiahaoYao commented 2 years ago

sure

On Mon, Aug 1, 2022, 09:48 Amog Kamsetty @.***> wrote:

@JiahaoYao https://github.com/JiahaoYao do you think you can take a look? Thanks!

— Reply to this email directly, view it on GitHub https://github.com/ray-project/ray_lightning/issues/189#issuecomment-1201457326, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7QK4IH2RCMO4PK4QYDUJ3VW75W3ANCNFSM55DGSZBA . You are receiving this because you were mentioned.Message ID: @.***>

JiahaoYao commented 2 years ago

Hi @m-lyon , i am about to look into ur issue. if you have some time at the same time, plz also try examples here https://github.com/ray-project/ray_lightning/blob/main/ray_lightning/examples/ray_ddp_tune.py. My guess is the example in the readme is not the latest.

stay tuned, will let you know if i figure out the issue.

JiahaoYao commented 2 years ago

Hi @m-lyon (cc @amogkam), the solution is to change

        strategy=[RayStrategy(num_workers=4, use_gpu=False)])

to

        strategy=RayStrategy(num_workers=4, use_gpu=False))

Then you can get the tune result:

== Status ==
Current time: 2022-08-01 20:45:17 (running for 00:00:22.83)
Memory usage on this node: 6.0/186.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/48 CPUs, 0/4 GPUs, 0.0/120.4 GiB heap, 0.0/55.59 GiB objects
Current best trial: cc8ad_00001 with loss=0.14701077342033386 and parameters={'layer_1': 64, 'layer_2': 64, 'lr': 0.019097374626280045, 'batch_size': 64}
Result logdir: /home/ubuntu/ray_results/tune_mnist
Number of trials: 2/2 (2 TERMINATED)
+-------------------------+------------+------------------+--------------+-----------+-----------+-----------+--------+------------------+----------+----------+
| Trial name              | status     | loc              |   batch_size |   layer_1 |   layer_2 |        lr |   iter |   total time (s) |     loss |      acc |
|-------------------------+------------+------------------+--------------+-----------+-----------+-----------+--------+------------------+----------+----------|
| train_mnist_cc8ad_00000 | TERMINATED | 10.0.2.151:44841 |           64 |        64 |        64 | 0.0422218 |      4 |          17.1973 | 0.27206  | 0.925576 |
| train_mnist_cc8ad_00001 | TERMINATED | 10.0.2.151:44880 |           64 |        64 |        64 | 0.0190974 |      4 |          16.6583 | 0.147011 | 0.957442 |
+-------------------------+------------+------------------+--------------+-----------+-----------+-----------+--------+------------------+----------+----------+

Thank you @m-lyon again for reporting this, and i am going to fill a pr to fix the readme!