Jnmpi notebook patch - Githubissues

liamhuber commented 9 months ago

Per the last commit on #33

Note: some examples in the pyiron_like_workflows.ipynb do not work (pyiron_workflow fails to connect some of the input and output ports, I was not able to figure out what is going wrong)

@JNmpi, up to the matgl stuff, the notebooks should run again. The key problem was that data channels can't be used arbitrarily as input to other objects (e.g. dictionaries. Cf. output and input injection below). Here I just did the close to the minimal changes to get the notebooks running again, but I'll keep playing around with this to see how we can reap the benefits of the dataclass attack without hitting this problem.

Summary:

One node relied on an input channel that had been commented out; I exploited the output-processing node injection to populate it automatically from an upstream node's output
**kwargs is not a valid signature for a function node; I replace this with a specific input that needs to be commensurate with the appropriate dataclass
There was a bug in parsing the lammps log; kinetic energies are now parsed by taking the difference of potential and total energy
I commented out all the storage stuff; hopefully this functionality is covered by the storage branch, but we'll find out for sure when we merge
The phonopy nodes no longer internally nest channels inside other objects' creators (see below re output and input injection)
The ec.json file gets downloaded as part of the notebook execution
I couldn't get matgl to install on my local environment, so we'll see if the CI fails at the same place

Output injection

@Workflow.wrap_as.macro_node()
def my_macro(wf, some_array):
    wf.my_node = wf.create.my_module.SomeNode(
        three_numbers=some_array[:3]
    )

The result macro has three nodes: a UserInput node created automatically by parsing the some_array in the signature, my_node which was explicitly created, and a dynamically created array slicing node. This was pretty easy to implement by overriding output channels to simply create new nodes; at the time of definition, such nodes are always terminal leaves (although we can later incorporate them into the graph just fine), and what input is being passed to them is always well-defined because they're being created from an output channel. So this will work for an arbitrary graph as long as OutputDataChannel has been overloaded to understand the requested operation, and it happens at declaration time.

Input injection

@Workflow.wrap_as.macro_node()
def my_macro(wf, some_array):
    wf.my_node = wf.create.my_module.SomeNode(
        my_parameters = {"array_input": some_array}
    )

At first glance, this might look analogous to output injection, but it's actually completely different. Remember that some_array is actually a UserInput node instance, so we're creating a dictionary with a node as one of its items. We're allowed to do that, but how should SomeNode interpret this input? We could overload SomeNode so that it knows how to interpret both "array_input": array and "array_input": Node data, but this is problematic because we still wouldn't have a graph edge between the some_array node and my_node instances -- in this attack all the processing is done at run-time inside the SomeNode node function.

So what we really want is to inject a new node between some_array and my_node -- i.e. to replace {"array_input": some_array} with SingleItemDict(key="array_input", value=some_array) where

@Workflow.wrap_as.single_value_node()
def SingleItemDict(key: str, value):
    return {key: value}

This would accomplish everything we want: we'd get a solid execution chain from some_array to my_node, and the data passed around is proper node channels the whole way.

The catch is, how is {"array_input": some_array} supposed to know it should turn itself into a node? With some_output[:3] it was easy, because we're calling __getitem__ on OutputDataChannel, so we can just define it how we want. But here, we would need to go in and modify dict.__new__ so that it returns a Node instead of a dict! Not only would making all these changes be arbitrarily time-consuming (since you need to do it for every object the user might pass data to, including objects they define, which means they need to remember to modify __new__!) but also it might have unwanted side-effects -- maybe somewhere totally different they actually want to make a dictionary of channels! By overrideing dict.__new__ we'd be making that impossible.

I like the direction the dataclasses are going, but unfortunately I think they lend themselves towards this "input injection" anti-pattern. We definitely need some sort of QoL tools here to avoid copy-pasting reused signatures all over the place, and to condense/group channels together to keep appearances tidier, but I'm skeptical that we'll ever be able to just throw a channel into some arbitrary object's __new__ or __init__ and get it to do what we're hoping.

review-notebook-app[bot] commented 9 months ago

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

github-actions[bot] commented 9 months ago

:point_left: Launch a binder notebook on branch _pyiron/pyiron_workflow/JNmpi_notebookpatch

codacy-production[bot] commented 9 months ago

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
Report missing for d6c57a9ada47d3e3177d9c03ad9cd8bfffd3773b[^1]	∅

Coverage variation details

| | Coverable lines | Covered lines | Coverage | | ------------- | ------------- | ------------- | ------------- | | Common ancestor commit (d6c57a9ada47d3e3177d9c03ad9cd8bfffd3773b) | Report Missing | Report Missing | Report Missing | | | Head commit (4a52aa30c6928b1a88e76af5fbeaac5a79eaf253) | 3570 | 2226 | 62.35% | **Coverage variation** is the difference between the coverage for the head and common ancestor commits of the pull request branch: ` - `

Diff coverage details

| | Coverable lines | Covered lines | Diff coverage | | ------------- | ------------- | ------------- | ------------- | | Pull request (#203) | 0 | 0 | **∅ (not applicable)** | **Diff coverage** is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: `/ * 100%`

See your quality gate settings Change summary preferences

_{You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation} [^1]: Codacy didn't receive coverage data for the commit, or there was an error processing the received data. Check your integration for errors and validate that your coverage setup is correct.

liamhuber commented 9 months ago

Notebooks error in pyiron_like_workflows:

ModuleNotFoundError                       Traceback (most recent call last)
Cell In[4], line 2
      1 wf = Workflow('Lammps')
----> 2 wf.register('atomistic', 'pyiron_workflow.node_library.atomistic')
      3 wf.register('atomistic_codes', 'pyiron_workflow.node_library.atomistic_codes')

File ~/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/composite.py:511, in Composite.register(cls, package_identifier, domain)
    508 @classmethod
    509 @wraps(Creator.register)
    510 def register(cls, package_identifier: str, domain: Optional[str] = None) -> None:
--> 511     cls.create.register(package_identifier=package_identifier, domain=domain)

File ~/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/interfaces.py:213, in Creator.register(self, package_identifier, domain)
    211         self._register_recursively_from_module(module, None, None)
    212 except ModuleNotFoundError as e:
--> 213     raise ModuleNotFoundError(
    214         f"In the current implementation, we expect package identifiers to be "
    215         f"modules, but {package_identifier} couldn't be imported. If this "
    216         f"looks like a module, perhaps it's simply not in your path?"
    217     ) from e

ModuleNotFoundError: In the current implementation, we expect package identifiers to be modules, but atomistic couldn't be imported. If this looks like a module, perhaps it's simply not in your path?

Notebooks error in phonopy_wf:

---------------------------------------------------------------------------
Exception encountered at "In [3]":
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/interfaces.py:134, in Creator.__getattr__(self, item)
    133 try:
--> 134     return self._package_access[item]
    135 except KeyError as e:

KeyError: 'atomistic'

The above exception was the direct cause of the following exception:

AttributeError                            Traceback (most recent call last)
Cell In[3], line 2
      1 wf = Workflow('test')
----> 2 wf.structure = wf.create.atomistic.structure.build.bulk('Al')

File ~/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/interfaces.py:136, in Creator.__getattr__(self, item)
    134     return self._package_access[item]
    135 except KeyError as e:
--> 136     raise AttributeError(
    137         f"{self.__class__.__name__} could not find attribute {item} -- did you "
    138         f"forget to register node package to this key?"
    139     ) from e

AttributeError: Creator could not find attribute atomistic -- did you forget to register node package to this key?

liamhuber commented 9 months ago

Ok, those were easily resolved by the fact that register now takes the identifier first and then (optionally) the domain shortcut instead of the other way around. We get a little farther in phonopy because by chance the domain shortcut is also actually the name of an importable module!

I seem to have a bug in the registration process though, such that only the first branch of deeply nested stuff is available.

liamhuber commented 9 months ago

I seem to have a bug in the registration process though, such that only the first branch of deeply nested stuff is available.

Indeed, this is a bug in how the creator's pathing is interacting with the module walk. Things are available, it's just that siblings get nested inside their elder sibling: e.g., for atomistic, instead of Workflow.create.atomistic.structure.build.bulk(), I need Workflow.create.atomistic.calculator.engine.property.structure.build.bulk().

I'll patch it on main, since it's a real generic bug, and then merge in the fix.

liamhuber commented 9 months ago

@JNmpi, I've been thinking about this issue, and even though we can't do everything universally, it may still be worthwhile offering wrappers for a few more common data types like dict and list. For instance, we could cook up a function that takes one of these objects and dynamically builds a node for it s.t. we can write things like

wf.phonopy = wf.create.atomistic.property.phonons.create_phonopy(
        structure=wf.relaxed_structure.outputs.structure,
        parameters=wf.create.standard.dict({"distance": displacement}),
        engine=wf.engine,
        max_workers=max_workers,
    )

Where displacement is a single value node, and wf.create.standard.dict actually first dynamically defines a new node type that has a single input channel (labeled "distance") and outputs a dictionary like we want, and instantiates this node connecting displacement to the new node's distance input.

I have kind of mixed feelings about this, because I worry that it will lull people into just writing parameters={"distance": displacement} and getting frustrated that it doesn't work. But it may be worth it.

Definitely we can do something similar for data classes, like

wf.phonopy = wf.create.atomistic.property.phonons.create_phonopy(
        structure=wf.relaxed_structure.outputs.structure,
        parameters=wf.create.atomistic.property.phonons.InputPhonopyGenerateSupercellsNode(
            distance=displacement
        ),
        engine=wf.engine,
        max_workers=max_workers,
    )

Where InputPhonopyGenerateSupercellsNode is returns a single value of type InputPhonopyGenerateSupercells, and as long as users have generated a data class InputPhonopyGenerateSupercells, making the node is trivially easy, like:

InputPhonopyGenerateSupercellsNode = Workflow.wrap_as.dataclass_node(InputPhonopyGenerateSupercells)

Again, I have some concerns here. Mostly I just really like the purity of two core node classes (Function and Macro) and get a bit nervous of slippery slopes to too many characters hanging around, but I'm willing to try it.

liamhuber commented 9 months ago

ImportError: cannot import name 'FileDataTemplate' from 'pyiron_base' (/usr/share/miniconda3/envs/my-env/lib/python3.11/site-packages/pyiron_base/__init__.py)

Notebook errors are some upstream import thing. I'll check on the versioning tomorrow.

liamhuber commented 9 months ago

Phonopy notebook failure:

Exception encountered at "In [36]":
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[36], line 1
----> 1 from pyiron_workflow.node_library.dev_tools import Output

Indeed you can't import this. I didn't see it because locally I had to stop at the matgl stuff. I went through and modified the rest of the book so it should run fine now.

liamhuber commented 9 months ago

ImportError: cannot import name 'FileDataTemplate' from 'pyiron_base' (/usr/share/miniconda3/envs/my-env/lib/python3.11/site-packages/pyiron_base/__init__.py)
Notebook errors are some upstream import thing. I'll check on the versioning tomorrow.

This is fixed in the pyiron_contrib:main, we just need a new release. Since that package will want to update base and atomistics dependencies anyhow, I'll briefly wait.

codacy-production[bot] commented 9 months ago

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation	Diff coverage
Report missing for d6c57a9ada47d3e3177d9c03ad9cd8bfffd3773b[^1]	:white_check_mark: 73.26%

Coverage variation details

| | Coverable lines | Covered lines | Coverage | | ------------- | ------------- | ------------- | ------------- | | Common ancestor commit (d6c57a9ada47d3e3177d9c03ad9cd8bfffd3773b) | Report Missing | Report Missing | Report Missing | | | Head commit (fce88318eead1e20586ea027c66b38968bfdd3f5) | 3577 | 2226 | 62.23% | **Coverage variation** is the difference between the coverage for the head and common ancestor commits of the pull request branch: ` - `

Diff coverage details

| | Coverable lines | Covered lines | Diff coverage | | ------------- | ------------- | ------------- | ------------- | | Pull request (#203) | 187 | 137 | **73.26%** | **Diff coverage** is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: `/ * 100%`

See your quality gate settings Change summary preferences

_{You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation} [^1]: Codacy didn't receive coverage data for the commit, or there was an error processing the received data. Check your integration for errors and validate that your coverage setup is correct.

liamhuber commented 9 months ago

Both notebooks now hit the import error. Wait for updated versions, black format, and it should be done.

liamhuber commented 9 months ago

The "rerun failed jobs" button for the CI report and "rerun job" for individual jobs are both doing nothing, so I am going to close and reopen...

liamhuber commented 9 months ago

In notebooks/phonopy_wf.ipynb

---------------------------------------------------------------------------
Exception encountered at "In [88]":
---------------------------------------------------------------------------
ReadinessError                            Traceback (most recent call last)
Cell In[88], line 1
----> 1 df = energy_at_volume(element='Fe').iter(strain=np.linspace(-0.2, 0.2, 11))
      2 df.plot(x='strain', ylabel='Energy (eV)', title='Energy-Volume Curve');
...
ReadinessError: Collect received a run command but is not ready. The node should be neither running nor failed, and all input values should conform to type hints.
Collect readiness: False
STATE:
running: False
failed: False
INPUTS:
out_dump ready: True
out_log ready: True
calc_mode ready: False
bla ready: True

The rest ran ok

liamhuber commented 9 months ago

The lammps engine node and its children had some minor issues to be worked through, but it was all soluble.

pyiron / pyiron_workflow

Jnmpi notebook patch #203

Output injection

Input injection

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences

Coverage summary from Codacy

See diff coverage on Codacy

See your quality gate settings Change summary preferences