Closed AlexandreKempf closed 5 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
9eb04c2
) 95.53% compared to head (bcf6c04
) 95.55%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
LGTM! Do you think there are any additional tests that would be helpful to add?
Sure! I took a bad habit the last few years not to write tests. It needs to become a habit once again. Sorry for that, I'll add them :)
@shcheklein
I updated the description to explain the problem a bit better.
I created a new test in the lightning framework since it was problematic.
The new test uses a thread and a sleep function, but I tried to apply what you suggested in the CPU monitoring PR about time.sleep
in tests. I also used @pytest.mark.timeout(3)
to ensure the test doesn't hang forever, but fails instead if it reaches a 3s duration.
Let me know what you think :)
@daavoo it would be great if you can take a look :)
thanks for the update @AlexandreKempf ! can we add a test / change this test a bit to have more datapoints with a different cadence of updates. Correct me folks, if I'm wrong but we keep counter per metric path, right? some metric can be updated a few times before the next step (and even before sync), some once. And we need in all cases make sure that on sync
we send "delta" properly.
@AlexandreKempf Are you ready to merge this one?
Following a bug detected during this feature development.
Using the
step
value to send the latest data to Studio can lead to weird behavior because thestep
is poorly defined in some loggers (eg. pytorch lightning logger). Because the step definition is poorly defined in lightning we used a hack to ensure thelog_metrics
calls by the lightning trainer were correct. But callinglog_metrics
from outside the lightning trainer (a separate thread for instance) leads to data not being sent to Studio or duplicates data.This PR introduces a counter for each metric that increments when Studio receives the data points. Instead of using the
step
property as a proxy for which data has been sent to studio, we literally count them now. This way, when we want to send data points to Studio, we can only send the points it hasn't received yet.The test added in the PR fails in the
main
branch becauselogs[test_metric]
iswhich is expected. But
test_calls
isWith this PR, both values
logs[test_metric]
andtest_calls
are the same and don't contain duplicated elements.[x] β I have followed the Contributing to DVCLive guide.
[x] π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.