materialsproject / jobflow

jobflow is a library for writing computational workflows.
https://materialsproject.github.io/jobflow
Other
90 stars 24 forks source link

BUG: Error in replaced Flow does not get handled properly #487

Open FabiPi3 opened 9 months ago

FabiPi3 commented 9 months ago

Description Consider the following minimal example:

from jobflow import job, Flow, run_locally, Response

@job
def second(a):
    raise ValueError("Dummy")

@job
def first():
    second_job = second(1)
    return Response(0, replace=Flow(second_job,
                                    output=second_job.output))

first_job = first()
last_job = second(first_job.output)
last_job.name = "last"
run_locally([first_job, last_job])

Here I try to use the replace functionality, and replace with a new Flow. But in the replaced Flow there is a problem, maybe due to some input value, and an error get raised. Running this gives me the following output:

2023-11-16 08:49:04,172 INFO Started executing jobs locally
2023-11-16 08:49:04,281 INFO Starting job - first (662206c4-19bd-43ab-84e6-98b2a090dee8)
2023-11-16 08:49:04,553 INFO Finished job - first (662206c4-19bd-43ab-84e6-98b2a090dee8)
2023-11-16 08:49:04,553 INFO Starting job - second (43048dfc-589e-4d47-9df1-2d3649454e2c)
2023-11-16 08:49:04,555 INFO second failed with exception:
Traceback (most recent call last):
  File "/Users/fp/code/jobflow/src/jobflow/managers/local.py", line 99, in _run_job
    response = job.run(store=store)
               ^^^^^^^^^^^^^^^^^^^^
  File "/Users/fp/code/jobflow/src/jobflow/core/job.py", line 583, in run
    response = function(*self.function_args, **self.function_kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/fp/code/excitingworkflow/excitingworkflow/tstsy.py", line 8, in second
    raise ValueError("Dummy")
ValueError: Dummy

2023-11-16 08:49:04,555 INFO Starting job - store_inputs (662206c4-19bd-43ab-84e6-98b2a090dee8, 2)
2023-11-16 08:49:04,603 INFO Finished job - store_inputs (662206c4-19bd-43ab-84e6-98b2a090dee8, 2)
2023-11-16 08:49:04,603 INFO Starting job - last (c7f9dfc8-dedc-4da4-8d30-af897c4e9367)
2023-11-16 08:49:04,700 INFO last failed with exception:
Traceback (most recent call last):
  File "/Users/fp/code/jobflow/src/jobflow/managers/local.py", line 99, in _run_job
    response = job.run(store=store)
               ^^^^^^^^^^^^^^^^^^^^
  File "/Users/fp/code/jobflow/src/jobflow/core/job.py", line 572, in run
    self.resolve_args(store=store)
  File "/Users/fp/code/jobflow/src/jobflow/core/job.py", line 678, in resolve_args
    resolved_args = find_and_resolve_references(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/fp/code/jobflow/src/jobflow/core/reference.py", line 451, in find_and_resolve_references
    resolved_references = resolve_references(
                          ^^^^^^^^^^^^^^^^^^^
  File "/Users/fp/code/jobflow/src/jobflow/core/reference.py", line 341, in resolve_references
    cache[uuid][index] = store.get_output(
                         ^^^^^^^^^^^^^^^^^
  File "/Users/fp/code/jobflow/src/jobflow/core/store.py", line 522, in get_output
    return find_and_resolve_references(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/fp/code/jobflow/src/jobflow/core/reference.py", line 432, in find_and_resolve_references
    return arg.resolve(store, cache=cache, on_missing=on_missing)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/fp/code/jobflow/src/jobflow/core/reference.py", line 161, in resolve
    raise ValueError(
ValueError: Could not resolve reference - 43048dfc-589e-4d47-9df1-2d3649454e2c not in store or index=None, cache={'43048dfc-589e-4d47-9df1-2d3649454e2c': {}}

2023-11-16 08:49:04,700 INFO Finished executing jobs locally

Expected behavior Since the last job depends on the output of the second job, it should not be started. jobflow should be terminated after the ValueError, as it happens with 'normal' jobs, flows and also replaced jobs.

lihaojie87 commented 4 months ago

I have the same problem, do you have a solution?