iterative / vscode-dvc

Machine learning experiment tracking and data versioning with DVC extension for VS Code
https://marketplace.visualstudio.com/items?itemName=Iterative.dvc
Apache License 2.0
195 stars 29 forks source link

Sharing experiments #3077

Closed dberenbaum closed 1 year ago

dberenbaum commented 1 year ago

Related to https://github.com/iterative/vscode-dvc/issues/2855, the extension can make it easier to share experiments. Let's discuss what's needed here?

My initial thoughts on what's needed:

  1. Show a comparison of all or a subset of experiments like what you see in the table and plots views, except that it's not stuck on your local machine.
  2. Merge or otherwise move forward with an experiment that you think is a keeper.

For 1, I think it makes sense to use Studio since it already has all this functionality. The extension can upload the params, metrics, and plots to Studio like dvclive is doing for live metrics (except for the "live" part). After selecting any number of experiments, there could be an option to post to Studio. The only user friction should be having a Studio token.

For 2, I think there are a lot of ways to do it in DVC already, so it's probably not as critical, but maybe VS Code can make it smoother. With one click, the extension could create a branch with the same name as the experiment, push that to GitHub, and show the URL to create a PR (like the git cli message Create a pull request on GitHub by visiting...). Regardless of the decided UX, it might be better to choose one and not overwhelm the user with options/choices here.


daavoo commented 1 year ago

With the current endpoint for live metrics, an existing experiment could be shared with 3 REST API calls:

json={
    "type": "start",
    "repo_url": "STUDIO_REPO_URL",
    "baseline_sha": "BASELINE_SHA",
    "name": "EXP_NAME",
    "client": "vscode",  # I think `client` is just ignored by studio
},
headers={
    "Authorization": "token STUDIO_TOKEN",
    "Content-type": "application/json",
}

Include here metrics, params, and plots (only linear plots are accepted by the API).

The API was designed for sending incremental updates of the plots on each step, but it would still work if the full data is sent and step is set to the latest:

json={
    "type": "data",
    "repo_url": "STUDIO_REPO_URL",
    "baseline_sha": "BASELINE_SHA",
    "name": "EXP_NAME",
    "step": 2,  
    "metrics": {"metrics.json": {"data": {"step": 2, "foo": 3}}},
    "params": {"params.yaml": {"fooparam": 1}},
    "plots": {"plots/foo.tsv": {"data": 
        [{"step": 0, "foo": 1.0}, {"step": 1, "foo": 2.0}, {"step": 2, "foo": 3.0}]}
    },
    "client": "vscode",
},
headers={
    "Authorization": "token STUDIO_TOKEN",
    "Content-type": "application/json",
},
json={
    "type": "done",
    "repo_url": "STUDIO_REPO_URL",
    "baseline_sha": "BASELINE_SHA",
    "name": "EXP_NAME",
    "client": "vscode",
},
headers={
    "Authorization": "token STUDIO_TOKEN",
    "Content-type": "application/json",
}
daavoo commented 1 year ago

Schema is defined in https://github.com/iterative/dvc-studio-client/blob/main/src/dvc_studio_client/schema.py

mattseddon commented 1 year ago

@daavoo how/where does a user get the STUDIO_TOKEN?

daavoo commented 1 year ago

@daavoo how/where does a user get the STUDIO_TOKEN?

From their profile in Studio UI: https://dvc.org/doc/studio/user-guide/projects-and-experiments/live-metrics-and-plots#set-up-an-access-token

daavoo commented 1 year ago

@mattseddon To clarify, STUDIO_REPO_URL is not the URL that you see in Studio UI and the format described in the current docs is outdated per https://github.com/iterative/studio/issues/4801

In the Python client, we try to set STUDIO_REPO_URL automatically from: git ls-remote --get-url

mattseddon commented 1 year ago

Sharing experiments from the extension to Studio

I can see from the docs that all that is needed to start live metrics to Studio is for the user to invoke exp run like this:

STUDIO_TOKEN=**** dvc exp run

@daavoo @dberenbaum what are the current plans for dvc-studio-client + DVC. I have some ideas/questions.

Authentication:

Is there any plan to have the DVC config support the STUDIO_TOKEN environment variable? This way users can simply save their token as an entry in a Git ignored .dvc/config.local and they won't have to bother with it again.

If the use of a token is supported in this way we could then add a CLI command which either:

Sharing experiments

Is there any plan to add functionality into exp push which will also push a completed experiment to Studio? Again if the DVC config supports a Studio token entry maybe this can be done by default and/or flag(s) can be added to make it happen.

The extension would be able to leverage the above functionality to effectively auth with Studio and push experiments without doing any chaining of commands/running custom code.

WDYT?

Note: If DVC starts supporting a STUDIO_TOKEN config value we would need to some flag(s) to exp run so that not all jobs are sent to Studio by default.

The obvious alternative to the above is for me to recreate the parts of dvc-studio-client mentioned by @daavoo here. Ideally, I don't think we should be supporting multi-language implementations of the same code. I would still have to build the auth flow and I think it should be replaced pretty quickly. IMO this feels like it would be a wasted effort. It would probably be better for someone to point me in the right direction(s) in the DVC codebase so that I can contribute there.

@dberenbaum could be a good idea for us to have a call to discuss this before the next cross-team meeting WDYT? I can be flexible to fit in with your TZ.

daavoo commented 1 year ago

Is there any plan to have the DVC config support the STUDIO_TOKEN environment variable? This way users can simply save their token as an entry in a Git ignored .dvc/config.local and they won't have to bother with it again.

I don't have a strong opinion but my feeling is that there are already a lot of existing tools/ways to handle environment variables and users might already have a preferred one to handle the usage of frequent variables

mattseddon commented 1 year ago

Ok, to get started I will build the capability within the extension and use a new VS Code config entry (dvc.studioToken) to store the required token. I'll post regular updates here to let everyone know where I'm up to. If anyone feels this is the wrong way to go then please LMK.

dberenbaum commented 1 year ago

I need to follow up here with my thoughts/plans so far. I'll try to write something thorough by tomorrow.

mattseddon commented 1 year ago

I've thrown together a quick prototype for a very interim auth solution at https://github.com/iterative/vscode-dvc/pull/3235.

dberenbaum commented 1 year ago

@mattseddon That looks really good as a starting point, although I think we do want to save the token in DVC as you suggested. I put a full proposal into https://github.com/iterative/studio/issues/5050. I'd suggest we discuss general product-facing questions there but maybe keep this or another issue open to discuss details that are only interesting to VS Code. WDYT?

mattseddon commented 1 year ago

Demo of basic auth flow (it is rough):

https://user-images.githubusercontent.com/37993418/218348629-bab42bbe-df12-4420-92c3-71df3debe5e2.mov

I think this will be (more or less) good enough for a one-time action once I've ironed it out but we can iterate over time.

As discussed previously the token will move back into DVC somewhere. It would be good to expose an endpoint in Studio that validates the token without having to send any data other than the token itself and a command in DVC that checks whether or not Studio is correctly "connected". This would mean the extension would know exactly when and when not to show any details regarding "Connect to Studio". We could also avoid issues created by users getting "stuck" not having a valid token and not being able to update it.

dberenbaum commented 1 year ago

@shcheklein Could Studio have a redirect so that one link would take you to either the token (if you are logged in) or the sign in page (if not)?

@mattseddon Can the connect screen provide a place to enter the token instead of having to take you back to the settings? Otherwise, LGTM as a first step.

mattseddon commented 1 year ago

updated demo:

https://user-images.githubusercontent.com/37993418/218623493-05e9b6ba-a80b-4962-8c10-238d08a9ba52.mov

dberenbaum commented 1 year ago

Sorry @mattseddon, I missed the first time that you enter the token in the command palette. What's the difference in the updated demo? Regardless, I think it looks like a good enough start for now and we can refine later.

mattseddon commented 1 year ago

Sorry @mattseddon, I missed the first time that you enter the token in the command palette. What's the difference in the updated demo?

We are now saving the token in VS Code's SecretStorage and the add/remove commands are exposed outside of the "welcome screen".

Regardless, I think it looks like a good enough start for now and we can refine later.

I am now going to knock out "Share to Studio" as quickly as possible.

mattseddon commented 1 year ago

With the token in place sharing live metrics from the extension to Studio is seamless:

https://user-images.githubusercontent.com/37993418/218898094-290c3d6c-eaeb-4e68-82e0-143fa0f403f9.mov

Do we want to add this as an option when the user has a token? "Run and Share", something like that? TBH I am not sure what value this adds to the local experience outside of allowing users to "work in the open". If all team members sent all experiments to Studio then everyone in that team would know exactly what experiments are being run and by who. Seems outside of the normal data science workflow but towards a best practice and better collaboration.

For the first iteration of this process, I am going to recreate parts of dvc-studio-client inside the extension. I do think that we should provide the option in exp push to push directly to Studio. Is this something that we are interested in? Giving users the ability to retro-actively share experiment results from the CLI? If it is then maybe diverting my effort to contributing that functionality inside DVC would be the best use of my time. WDYT?

mattseddon commented 1 year ago

Also found/ran into https://github.com/iterative/studio/issues/5009.

image

I think I could easily get bogged down here. For the time being/the first prototype, I will not send plot information.

Note: Sharing plot data outside of the happy path is definitely more tricky. E.g if a user changes a template/plot type locally for an experiment and then shares it with Studio what happens? Could we limit the types of plots sent to Studio to a few different basic plot types, do we have to send the contents of the dvc.yaml/templates to Studio with each experiment... šŸ˜¢?

mattseddon commented 1 year ago

(Surprisingly) I have a working prototype for sharing from the experiments table:

https://user-images.githubusercontent.com/37993418/218975832-75bad666-5d98-4442-94df-026c2051093e.mov

Some caveats:

I have done this mainly because I think we need to push the "share" functionality back into the CLI and setup a new endpoint in Studio which can accept a finished experiment (e.g https://studio.iterative.ai/api/share as opposed to https://studio.iterative.ai/api/live). The main reason for the new endpoint is that the biggest lag in the command being run and the experiment showing up in the Studio UI at the moment will be the 3 requests. If we can drop this from 3 trips to 1 we should see a significant performance boost.

Any questions or concerns?

dberenbaum commented 1 year ago

If all team members sent all experiments to Studio then everyone in that team would know exactly what experiments are being run and by who. Seems outside of the normal data science workflow but towards a best practice and better collaboration.

This is actually pretty similar to the mlflow and W&B workflow. One thing we have pitched in the past with VS Code is that you don't need to do this to avoid cluttering the central server with every experiment, so not sure how much we should push this.

Giving users the ability to retro-actively share experiment results from the CLI?

@mattseddon This looks really cool and is the right UI IMO. We discussed doing this, but the plan with https://github.com/iterative/studio/issues/5050 is to focus on sharing the exp ref instead of using dvc-studio-client for a few reasons:

daavoo commented 1 year ago

If we can drop this from 3 trips to 1 we should see a significant performance boost.

For the record, sending just the done event already works (the experiment row gets rendered).

I am working on updating the done event to also accept metrics https://github.com/iterative/dvc/issues/9026 (it already supports params), so after that, you should be able to achieve what you currently do with only 1 request (done).

Personally, I would like to push for https://github.com/iterative/studio/issues/5050 , but it requires work on Studio backend and I don't know the priorities.

In the meantime, the above should work.

mattseddon commented 1 year ago

:pray: Thanks @daavoo & @dberenbaum.

@mattseddon This looks really cool and is the right UI IMO. We discussed doing this, but the plan with https://github.com/iterative/studio/issues/5050 is to focus on sharing the exp ref instead of using dvc-studio-client for a few reasons

Going back and reading what I wrote I was not explicit enough in stating "IMO sharing should be done via exp push (maybe with a --studio flag)". That can easily serve as a drop-in replacement for the code that I've written. I'll get things finalised and shipped ASAP. I'll also add to the context menu in the experiments tree (less important).

mattseddon commented 1 year ago

The first iteration of the happy path is done. Should ship tomorrow after addressing feedback (https://github.com/iterative/vscode-dvc/pull/3289).

Demo

https://user-images.githubusercontent.com/37993418/219319541-824126ba-f781-4e0a-b497-6f1a19286030.mov

I'll post a product demo in #vs-code tomorrow.

dberenbaum commented 1 year ago

@mattseddon Looks good!

What is the behavior wrt live sharing? I guess we probably need some toggle to enable/disable it?

mattseddon commented 1 year ago

This is actually pretty similar to the mlflow and W&B workflow. One thing we have pitched in the past with VS Code is that you don't need to do this to avoid cluttering the central server with every experiment, so not sure how much we should push this.

What is the behavior wrt live sharing? I guess we probably need some toggle to enable/disable it?

I will implement Run Experiment and Share Live versions of the Run Experiment command that are less prominent but available.

shcheklein commented 1 year ago

If token is available can we enable it by default? (and have a clear way to disable of course)

dberenbaum commented 1 year ago

Yeah, I would prefer a setting that can be enabled/disabled over an additional action, and a checkbox on the welcome screen if possible.

mattseddon commented 1 year ago

One thing we have pitched in the past with VS Code is that you don't need to do this to avoid cluttering the central server with every experiment, so not sure how much we should push this.

Doesn't having an off/on setting which is switched on by default go directly against the above statement? Do we even need this when VS Code is meant to be for the local experience? Can we leave this functionality out and only expose "Share to Studio" as the initial link between the two products?

shcheklein commented 1 year ago

I think we need a clear way to enable / disable sharing the experiments as people run them (live sharing). As we discussed:

But it should be visible, clear. I don't think that action in the command palette is enough for this.

When we first collect the token we should probably show this toggle (and enable by default?), we should also introduce a section on the Settings page that we already have with the token and with this toggle.

In the DVCLive snippet we should show a way to enable sharing via code.

dberenbaum commented 1 year ago

@shcheklein What's the user scenario you have in mind? I can imagine it could be useful if I have a long-running experiment and I or others need to check on it after I have closed my laptop, but I think that would be more of a niche scenario compared to something like training in CI where I have no other way to check on it easily. I want to make sure I understand what the goal is and whether it's driven by a particular user scenario or by a desire to show the feature.

dberenbaum commented 1 year ago

Despite what I wrote above, I agree it makes sense as a toggle more than an action, since it does not need to be specific to each experiment. It's probably more of a general workflow preference.

shcheklein commented 1 year ago

Yes, this primarily to expose the feature. But also, this is practical - I might run an experiment on a remote machine via SSH, or codespaces and want to share it still so that other people can track the progress. Or, let's say to compare it with something else that I have only in Studio, etc.

Since it's a low hanging fruit, I don't see any major concerns to enable this, but we can get more insights more usage at the end.

mattseddon commented 1 year ago

There are a couple of updates at #3387 & #3379.

Next steps (next week):

  1. Once Share New Experiments Live is enabled start the queue with the required environment variables to share live results from queued experiments directly to Studio (need both STUDIO_TOKEN and STUDIO_REPO_URL).
  2. Split into two options (Share New Workspace Experiments Live & Share New Queued Experiments Live). This is more for visibility than anything else.
  3. Expose Open Studio Settings in the command palette.
dberenbaum commented 1 year ago

2. Split into two options (Share New Workspace Experiments Live & Share New Queued Experiments Live). This is more for visibility than anything else.

Sorry, I'm not following what you mean by "visibility" here or what this part is for. Otherwise, all makes sense to me, thanks!

mattseddon commented 1 year ago

Sorry, I'm not following what you mean by "visibility" here or what this part is for. Otherwise, all makes sense to me, thanks!

The current dvc.studio.shareExperimentsLive option will become dvc.studio.shareWorkspaceExperimentsLive & dvc.studio.shareQueuedExperimentsLive and there will be two checkboxes on the settings page instead of one. Users will be able to send none, one or both types. Does that make sense?

dberenbaum commented 1 year ago

I guess I was wondering more why we want to have two separate checkboxes?

mattseddon commented 1 year ago

If you don't think it is necessary to give that level of control and/or that it won't provide value then I won't do the work šŸ™šŸ».

dberenbaum commented 1 year ago

Up to @shcheklein. I just didn't see the motivation to have that granularity of control over live sharing.

shcheklein commented 1 year ago

Yep, I also don't see the need for this for now. We can keep it simpler.

omesser commented 1 year ago

I join the opinion that it's best to make this a simple user-facing feature of "live sharing experiments" (for everything). users will probably have the control they need toggling this on and off while running queues/workspace experiments. If this is used and more granularity is requested - we can always "complicate" this in the future šŸ˜„

daavoo commented 1 year ago

For the record @mattseddon with the latest Studio release, you should now be able to only send done event

mattseddon commented 1 year ago

3422 will close this as all of the discussion/scoping is on the Studio side right now.