The MWE trains a dummy model with the lightning framework and logs metrics with the DVCLiveLogger. I ran python test.py without logging in to DVC studio, this is the output I got:
The whole training took 37.90 seconds to complete. Then I did dvc studio login and ran python test.py again. This time the training took 32.86 MINUTES to complete and the output slightly changed:
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
---------------------------------
0 | layer | Linear | 4
---------------------------------
4 Trainable params
0 Non-trainable params
4 Total params
0.000 Total estimated model params size (MB)
Epoch 17: 97%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 31/32 [00:20<00:00, 1.53it/s, v_num=_run]
WARNING:dvc_studio_client:Failed to post to Studio: {"code": 429, "detail": "You have exceeded your rate limits."}
WARNING:dvclive:`post_to_studio` `data` failed.
Epoch 18: 97%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 31/32 [00:18<00:00, 1.67it/s, v_num=_run]
WARNING:dvc_studio_client:Failed to post to Studio: {"code": 429, "detail": "You have exceeded your rate limits."}
WARNING:dvclive:`post_to_studio` `data` failed.
Epoch 99: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:19<00:00, 1.62it/s, v_num=_run]
`Trainer.fit` stopped: `max_epochs=100` reached.
Epoch 99: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:19<00:00, 1.62it/s, v_num=_run]
Note the intermittent "You have exceeded your rate limits" warnings. From what I can tell, training while logged into DVC studio is constantly slow, even before the first rate limit warning appears. If I dvc studio logout again, the training speed goes back to normal (~ 1 run / 40 seconds).
I really like DVC for its data versioning and pipeline management capabilities and I would like to use DVC studio for live metrics monitoring since it understands if I associate different pipeline stages with different dvclive/ outputs. So it would be very nice if we could figure out what is causing the training delay here. :)
Consider this example project complete with MWE and conda environment.
The MWE trains a dummy model with the lightning framework and logs metrics with the
DVCLiveLogger
. I ranpython test.py
without logging in to DVC studio, this is the output I got:The whole training took 37.90 seconds to complete. Then I did
dvc studio login
and ranpython test.py
again. This time the training took 32.86 MINUTES to complete and the output slightly changed:Note the intermittent "You have exceeded your rate limits" warnings. From what I can tell, training while logged into DVC studio is constantly slow, even before the first rate limit warning appears. If I
dvc studio logout
again, the training speed goes back to normal (~ 1 run / 40 seconds).I really like DVC for its data versioning and pipeline management capabilities and I would like to use DVC studio for live metrics monitoring since it understands if I associate different pipeline stages with different
dvclive/
outputs. So it would be very nice if we could figure out what is causing the training delay here. :)