iterative / dvclive

πŸ“ˆ Log and track ML metrics, parameters, models with Git and/or DVC
https://dvc.org/doc/dvclive
Apache License 2.0
161 stars 33 forks source link

Add internal counters to send only the latest datapoints to studio #788

Closed AlexandreKempf closed 5 months ago

AlexandreKempf commented 5 months ago

Following a bug detected during this feature development.

Using the step value to send the latest data to Studio can lead to weird behavior because the step is poorly defined in some loggers (eg. pytorch lightning logger). Because the step definition is poorly defined in lightning we used a hack to ensure the log_metrics calls by the lightning trainer were correct. But calling log_metrics from outside the lightning trainer (a separate thread for instance) leads to data not being sent to Studio or duplicates data.

This PR introduces a counter for each metric that increments when Studio receives the data points. Instead of using the step property as a proxy for which data has been sent to studio, we literally count them now. This way, when we want to send data points to Studio, we can only send the points it hasn't received yet.

The test added in the PR fails in the main branch because logs[test_metric] is

[
    {'step': '0', 'test': '0.5'}, 
    {'step': '1', 'test': '0.5'}, 
    {'step': '2', 'test': '0.5'}, 
    {'step': '3', 'test': '0.5'}
]

which is expected. But test_calls is

[
    {'step': '0', 'test': '0.5'}, 
    {'step': '1', 'test': '0.5'},
    {'step': '1', 'test': '0.5'}, 
    {'step': '1', 'test': '0.5'},
    {'step': '2', 'test': '0.5'}, 
    {'step': '3', 'test': '0.5'},
    {'step': '3', 'test': '0.5'}
]

With this PR, both values logs[test_metric] and test_calls are the same and don't contain duplicated elements.

codecov-commenter commented 5 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (9eb04c2) 95.53% compared to head (bcf6c04) 95.55%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #788 +/- ## ========================================== + Coverage 95.53% 95.55% +0.02% ========================================== Files 55 55 Lines 3559 3580 +21 Branches 319 319 ========================================== + Hits 3400 3421 +21 Misses 111 111 Partials 48 48 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

AlexandreKempf commented 5 months ago

LGTM! Do you think there are any additional tests that would be helpful to add?

Sure! I took a bad habit the last few years not to write tests. It needs to become a habit once again. Sorry for that, I'll add them :)

AlexandreKempf commented 5 months ago

@shcheklein I updated the description to explain the problem a bit better. I created a new test in the lightning framework since it was problematic. The new test uses a thread and a sleep function, but I tried to apply what you suggested in the CPU monitoring PR about time.sleep in tests. I also used @pytest.mark.timeout(3) to ensure the test doesn't hang forever, but fails instead if it reaches a 3s duration.

Let me know what you think :)

shcheklein commented 5 months ago

@daavoo it would be great if you can take a look :)

shcheklein commented 5 months ago

thanks for the update @AlexandreKempf ! can we add a test / change this test a bit to have more datapoints with a different cadence of updates. Correct me folks, if I'm wrong but we keep counter per metric path, right? some metric can be updated a few times before the next step (and even before sync), some once. And we need in all cases make sure that on sync we send "delta" properly.

dberenbaum commented 5 months ago

@AlexandreKempf Are you ready to merge this one?