Standardized option for short jobs

gpetretto commented 4 months ago

In several Flows there is the need to define small and short Jobs. In atomate2 there are Jobs that just calculate a supercell or generate perturbations. While these require a minimal computational effort, identifying them and tune their execution may be quite annoying. For this reason I would be interested in defining a standard way of marking such jobs, so that managers can then automatically optimize their execution. In the case of jobflow-remote there could be an internal local worker that would automatically execute those jobs. Some helper could probably be defined for the fireworks manager as well. They key point is that it should be possible for the Flow developer to directly select these jobs, instead of being on the shoulders of the user. I am not sure what would be the best way of doing that. I was thinking of a new Job (or JobConfig) attribute like fast, short small that is False by default. So it would be easily marked and easily retrieved by the manager. For example:

@job(small=True)
def sum(a, b):
    return a + b

Any comments or ideas about this feature?

ml-evs commented 4 months ago

Funnily enough I had also been implementing this but in jobflow-remote directly (and discussing earlier today with @victrqt who was running into similar issues). I've started adding the option of profile or exec_profile as a free text value, e.g.,

@job(profile="analysis")

or

@job(profile="postprocessing")

which can then be used in jobflow-remote config to specify a default worker and exec config for jobs that match the profile. If this could be standardized at the jobflow level it would be super helpful as other managers can also make use of it.

The same issue comes up with e.g., jobs that require (or at least can make use of) GPUs. I don't know whether it is necessary to have a set of "known" profiles or whether this can be handled by convention (either way the user probably has to choose the appropriate resources for a 'small' job)

gpetretto commented 4 months ago

Nice! I like the idea, as it allows more flexibility. On the other hand it requires a bit more work on the configuration from the user. However, the instruction for a standard worker that covers these cases can be provided in the documentation.

An additional point that concerns more jobflow-remote is that it could be needed to know if a job can be executed with just the inputs, or if it needs to have access to some files from previous jobs. For example this function in atomate2: https://github.com/materialsproject/atomate2/blob/7f4d5a60d427295dee3a0f6a9b87deb5f47d7f8a/src/atomate2/common/jobs/defect.py#L187 is clearly something that could be executed quickly, but I think that it needs to be executed on the machine where the previous jobs were executed. Not sure if there is any easy way to define or identify these kind of jobs.

In any case, I believe this needs to be implemented directly in jobflow to be effective.

ml-evs commented 4 months ago

Files are definitely a big blocker for me too; not sure how to approach this with the current API (have played around a bit with additional stores but it doesn't quite make sense to me). Being able to launch a job from the context of an older job (as resolved by the manager) would be very helpful, as would resolving dependencies on data present in additional stores.

Andrew-S-Rosen commented 4 months ago

Absolutely amazing idea, and I love the design proposed by @ml-evs. The way I have been getting around this is very hacky with FireWorks...

Andrew-S-Rosen commented 2 months ago

One caveat here: if you send some jobs to the local compute resource, this would require all runtime dependencies to also be present there (which may not necessarily be the case).

materialsproject / jobflow

Standardized option for short jobs #546