exp: set workers per stage

dberenbaum commented 1 year ago

Related: #755. This is a narrower issue than #755 that comes from conversations with users about how they can run more experiments at once.

dvc exp run -j runs experiments in parallel across multiple workers, but experiments may be too memory-intensive to run many experiments at once, even on large machines. One way to mitigate this issue is by identifying which stages are memory-intensive and setting quotas on the number of workers for those stages. For example, I may have many stages that read data in small batches but a single stage that must read in all the data at once to aggregate/combine it. If I can tell DVC that I want to use a maximum of 10 jobs for all other stages but only 2 jobs for my aggregation stage, then I can still take advantage of experiment parallelization without overloading the machine's memory.

pmrowla commented 1 year ago

I don't think this is a different issue than #755. It's a slightly different high level use case, but in terms of DVC functionality it's asking for the same thing. The exp run workers don't know anything about the pipeline. They just run the top level dvc repro calls separately.

What we are talking about here (and in #755) is having a completely separate worker pool for executing individual stages in parallel, and dvc repro using those workers when it runs the stages in a pipeline.

(I think this issue is actually more complex than #755, since it also needs the ability to restrict workers to specific stages, and designate different levels of concurrency for individual stages, as opposed to a single level of concurrency for any/all stages)

shcheklein commented 1 year ago

@pmrowla I think it can be a simpler scope though. If "somehow" we pass information to DVC and it actively throttles / waits before running a stage if there are multiple stages of that type are running already. I don't think we need a separate pool then, or ability to run different stages in parallel within a single pipeline (e.g. parallelizing foreach, etc) then.

It's a different question if that makes sense to implement in such way or not. Point here is that, indeed, a high level scenario doesn't require any additional parallelism as far as I understand, it requires an ability to add quotas into the existing system (on the dvc.yaml file, not workers), and there can be different ways of implementing this.

WDYT?

pmrowla commented 1 year ago

In this case, your other workers are just going to block when they get to stage with quotas (since we can only run a pipeline sequentially right now). Using dave's example scenario, the 8 other workers are going to spend a majority of the time blocking and waiting for the memory intensive jobs to finish (since we can only do 2 of them at a time), in which case this is effectively the same thing as just doing exp run -j 2 with the existing behavior.

I guess I'm (maybe incorrectly) making the assumption that the memory intensive stage is going to take the most execution time for the user's pipeline. In the event that the rest of the pipeline is actually slower than the memory intensive stage then I suppose there would still be some benefit to this?

shcheklein commented 1 year ago

your other workers are just going to block when they get to stage with quotas

Yes, that's true, but it fits certain scenarios quite well. E.g. there are some relatively quick but very resource intensive stages. It's fine that we don't execute the next stage until some of the first are done and there are some free workers even if that means that some workers are not doing anything.

pmrowla commented 1 year ago

In that case we could probably do this with a relatively naive solution like keeping a count of the number of processes running each stage at a time somewhere in the main DVC repo (and queue/temp runs would have to know to check the count in the main repo, not within the temp workspace's .dvc directory). But it would still have to be smart enough to account for situations where one worker or pipeline stage dies ungracefully (meaning one and only one of the blocking workers has to then decrement the count for the crashed worker)

This also only works when there is only one person running DVC experiments on the machine (in one DVC repo). If you have a scenario where multiple users are running jobs on a particular machine (from their own separate clones of a DVC repo), this won't actually work unless the counter was system-wide.

dberenbaum commented 1 year ago

@pmrowla This is related to a user request where the low-memory stage is a remote job that actually takes the bulk of the time but obviously is not memory-intensive on the local machine.

JulianWgs commented 1 year ago

Hi all, nice seeing this discussion! This feature request came from my side.

This also only works when there is only one person running DVC experiments on the machine (in one DVC repo). If you have a scenario where multiple users are running jobs on a particular machine (from their own separate clones of a DVC repo), this won't actually work unless the counter was system-wide.

I would opt for a naive solution where the user is responsible for insuring that there are no other users starting memory intensive tasks (dvc or not) on the same machine. So an user, repo or even per dvc exp run counter would be sufficient.

efiop commented 1 year ago

@JulianWgs We've been trying to ping you through the email, but we are hitting 550 spam detected. transport denied and don't have other means of reaching you 🙁

Moynihan18 commented 1 year ago

@JulianWgs thanks for getting back to us. I got your email, but I got caught by the spam detector in the reply. I'll hear from you next week!

Moynihan18 commented 1 year ago

@JulianWgs Hey Julian - still blocked by spam. Didn't hear from you this week - checking back in here. Update?

iterative / dvc

exp: set workers per stage #9363