materialsproject / jobflow

jobflow is a library for writing computational workflows.
https://materialsproject.github.io/jobflow
Other
96 stars 24 forks source link

Bug: `Flow.draw_graph()` displays empty graph after job has run #332

Open xperrylinn opened 1 year ago

xperrylinn commented 1 year ago

Describe the bug A clear and concise description of what the bug is.

Calling flow.draw_graph(figsize=(8, 8)).show() draws graph images with no edges. It appears to be related to how I'm using the API because when I run the code in the tutorial notebook it draws the graph with no problem. Below is a screen capture of an example. I'm using joblow==0.1.11

Screenshot 2023-06-03 at 8 48 13 AM

This is the code that I've written to produce this:

from pymatgen.io.openmm.sets import OpenMMSet
from maggma.stores import MemoryStore
from jobflow import run_locally
from jobflow import JobStore
from jobflow import Flow
from atomate2_openmm.jobs.energy_minimization_maker import EnergyMinimizationMaker
from atomate2_openmm.jobs.nvt_maker import NVTMaker

import os

# Define file path to input set of files
input_file_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "../tests/test_data/alchemy_input_set")

# Create the initial OpenMMSet
input_set = OpenMMSet.from_directory(
    directory=input_file_path,
    topology_file="topology_pdb",
    system_file="system_xml",
    integrator_file="integrator_xml",
    state_file="state_xml",
    contents_file="contents_json",
)

# Create jobs
energy_minimization_job = EnergyMinimizationMaker().make(input_set=input_set)
nvt_job = NVTMaker(
    steps=100,
    state_reporter_interval=10,
    dcd_reporter_interval=10,
    temperature=700,
).make(input_set=energy_minimization_job.output["doc_store"].calculation_output.output_set)

# Setup a Flow
flow = Flow(jobs=[energy_minimization_job, nvt_job],)

# Create JobStore
doc_store = MemoryStore()
trajectory_store = MemoryStore()
job_store = JobStore(docs_store=doc_store, additional_stores={"trajectory_store": trajectory_store})

# Run the Production Flow
responses = run_locally(flow=flow, store=job_store, ensure_success=True)

nvt_traj_blob_uuid = next(doc_store.query(criteria={"uuid": flow.jobs[-1].uuid}))["output"]["trajectories"]["blob_uuid"]
dcd_report = next(trajectory_store.query(criteria={"blob_uuid": nvt_traj_blob_uuid}))
assert dcd_report["@class"], "DCDReports"

flow.draw_graph(figsize=(8, 8)).show()

I tried searching through the previous issues to see if others have encountered this, but I didn't find anything. Any tips on what might be happening here?

To Reproduce Steps to reproduce the behavior:

create a conda env with the following spec:

name: atomate2-openmm
channels:
  - conda-forge
dependencies:
  - python=3.9
  - pip
  - openmm
  - openff-toolkit # pymatgen-io-openmm
  - MDAnalysis # pymatgen-io-openmm
  - atomate2
  - jobflow
  - pip:
      - pytest
      - pytest-cov
      - pandas
      - pymatgen-io-openmm @ git+https://github.com/orionarcher/pymatgen-io-openmm.git
      - atomate2-openmm @ git+ssh://git@github.com/xperrylinn/atomate2-openmm.git@feature/additional_stores

Provide any example files that are needed to reproduce the error, especially if the bug pertains to parsing of a file.

Expected behavior A clear and concise description of what you expected to happen.

Edges between nodes in the graph drawing.

Screenshots If applicable, add screenshots to help explain your problem.

Included above in description.

davidwaroquiers commented 1 year ago

Dear @xperrylinn ,

This is due to how jobflow is architectured and it can indeed be confusing. The graph of the flow is constructed from the OutputReferences that you set in the jobs:flows (input_set=energy_minimization_job.output["doc_store"].calculation_output.output_set in your script). Then when you run locally, these output references are resolved automatically so they actually do not exist anymore. When you try to draw the graph, it does not see any OutputReference. If you try to draw the graph before running it, you will see the edge.

This is indeed something that is a bit problematic (even though the flows are working) as one may want to look at the flows after they have run (not just before). We are discussing options to remove this drawback.

Best,

xperrylinn commented 1 year ago

Understood! Thank you, @davidwaroquiers.

Screenshot 2023-06-05 at 8 12 46 PM