iterative / dvclive

📈 Log and track ML metrics, parameters, models with Git and/or DVC
https://dvc.org/doc/dvclive
Apache License 2.0
161 stars 33 forks source link

fix(studio): wait for studio metrics publish to complete on end #827

Closed shcheklein closed 1 month ago

shcheklein commented 1 month ago

W/o this fix in the example-get-started-experiments we are not getting images published to Studio (it doesn't have time to complete, evaluate exits faster).


Thank you for the contribution - we'll try to review it as soon as possible. 🙏

dberenbaum commented 1 month ago

Looking at the discussion from when we added threads, I think this was intentional and is why the thread has daemon=True. Some questions:

  1. Does dropping daemon=True accomplish the same thing?
  2. What happens if any call to Studio hangs? I think that was why I added daemon=True.
  3. Could we make a final data post to Studio in the main thread? That would ensure the final data is posted without needing every post to return.
shcheklein commented 1 month ago

Looking at the discussion from when we added threads, I think this was intentional and is why the thread has daemon=True. Some questions:

Right, but at the end of the process if becomes unpredictable / we are losing data ... not sure this is expected tbh ...

Does dropping daemon=True accomplish the same thing?

We would need probably then a way to signal that thread to complete?

Does dropping daemon=True accomplish the same thing?

In this case it be waiting only at the end ... we can probably put some timeout? But as far as I understand we already do have some timeouts on post to studio call. So, it should not be infinite ...

That would ensure the final data is posted without needing every post to return.

Could you clarify please? Not sure how this idea is easier / can help tbh ...

shcheklein commented 1 month ago

I assume that timeout is specified here:

response = requests.post(
            url,
            json=body,
            headers={
                "Content-type": "application/json",
                "Authorization": f"token {token}",
            },
            timeout=(30, 5),
        )