Closed m-lyon closed 2 years ago
Hey @m-lyon, can you use pytorch Lightning 1.6.4 instead of 1.6.0?
@amogkam I tried with pytorch-lightning==1.6.4
and the same error occurs.
@JiahaoYao do you think you can take a look? Thanks!
sure
On Mon, Aug 1, 2022, 09:48 Amog Kamsetty @.***> wrote:
@JiahaoYao https://github.com/JiahaoYao do you think you can take a look? Thanks!
— Reply to this email directly, view it on GitHub https://github.com/ray-project/ray_lightning/issues/189#issuecomment-1201457326, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7QK4IH2RCMO4PK4QYDUJ3VW75W3ANCNFSM55DGSZBA . You are receiving this because you were mentioned.Message ID: @.***>
Hi @m-lyon , i am about to look into ur issue. if you have some time at the same time, plz also try examples here https://github.com/ray-project/ray_lightning/blob/main/ray_lightning/examples/ray_ddp_tune.py. My guess is the example in the readme is not the latest.
stay tuned, will let you know if i figure out the issue.
Hi @m-lyon (cc @amogkam), the solution is to change
strategy=[RayStrategy(num_workers=4, use_gpu=False)])
to
strategy=RayStrategy(num_workers=4, use_gpu=False))
Then you can get the tune result:
== Status ==
Current time: 2022-08-01 20:45:17 (running for 00:00:22.83)
Memory usage on this node: 6.0/186.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/48 CPUs, 0/4 GPUs, 0.0/120.4 GiB heap, 0.0/55.59 GiB objects
Current best trial: cc8ad_00001 with loss=0.14701077342033386 and parameters={'layer_1': 64, 'layer_2': 64, 'lr': 0.019097374626280045, 'batch_size': 64}
Result logdir: /home/ubuntu/ray_results/tune_mnist
Number of trials: 2/2 (2 TERMINATED)
+-------------------------+------------+------------------+--------------+-----------+-----------+-----------+--------+------------------+----------+----------+
| Trial name | status | loc | batch_size | layer_1 | layer_2 | lr | iter | total time (s) | loss | acc |
|-------------------------+------------+------------------+--------------+-----------+-----------+-----------+--------+------------------+----------+----------|
| train_mnist_cc8ad_00000 | TERMINATED | 10.0.2.151:44841 | 64 | 64 | 64 | 0.0422218 | 4 | 17.1973 | 0.27206 | 0.925576 |
| train_mnist_cc8ad_00001 | TERMINATED | 10.0.2.151:44880 | 64 | 64 | 64 | 0.0190974 | 4 | 16.6583 | 0.147011 | 0.957442 |
+-------------------------+------------+------------------+--------------+-----------+-----------+-----------+--------+------------------+----------+----------+
Thank you @m-lyon again for reporting this, and i am going to fill a pr to fix the readme!
Using the toy example code found here,
I get the following error:
Not really sure what i'm doing incorrectly, I have the following versions:
torch==1.11.0
,pytorch-lightning==1.6.0
,ray-lightning==0.3.0
, andray==1.13.0