Closed liamhuber closed 9 months ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Coverage variation | Diff coverage |
---|---|
Report missing for d6c57a9ada47d3e3177d9c03ad9cd8bfffd3773b[^1] | ∅ |
You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation [^1]: Codacy didn't receive coverage data for the commit, or there was an error processing the received data. Check your integration for errors and validate that your coverage setup is correct.
Notebooks error in pyiron_like_workflows
:
ModuleNotFoundError Traceback (most recent call last)
Cell In[4], line 2
1 wf = Workflow('Lammps')
----> 2 wf.register('atomistic', 'pyiron_workflow.node_library.atomistic')
3 wf.register('atomistic_codes', 'pyiron_workflow.node_library.atomistic_codes')
File ~/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/composite.py:511, in Composite.register(cls, package_identifier, domain)
508 @classmethod
509 @wraps(Creator.register)
510 def register(cls, package_identifier: str, domain: Optional[str] = None) -> None:
--> 511 cls.create.register(package_identifier=package_identifier, domain=domain)
File ~/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/interfaces.py:213, in Creator.register(self, package_identifier, domain)
211 self._register_recursively_from_module(module, None, None)
212 except ModuleNotFoundError as e:
--> 213 raise ModuleNotFoundError(
214 f"In the current implementation, we expect package identifiers to be "
215 f"modules, but {package_identifier} couldn't be imported. If this "
216 f"looks like a module, perhaps it's simply not in your path?"
217 ) from e
ModuleNotFoundError: In the current implementation, we expect package identifiers to be modules, but atomistic couldn't be imported. If this looks like a module, perhaps it's simply not in your path?
Notebooks error in phonopy_wf
:
---------------------------------------------------------------------------
Exception encountered at "In [3]":
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/interfaces.py:134, in Creator.__getattr__(self, item)
133 try:
--> 134 return self._package_access[item]
135 except KeyError as e:
KeyError: 'atomistic'
The above exception was the direct cause of the following exception:
AttributeError Traceback (most recent call last)
Cell In[3], line 2
1 wf = Workflow('test')
----> 2 wf.structure = wf.create.atomistic.structure.build.bulk('Al')
File ~/work/pyiron_workflow/pyiron_workflow/pyiron_workflow/interfaces.py:136, in Creator.__getattr__(self, item)
134 return self._package_access[item]
135 except KeyError as e:
--> 136 raise AttributeError(
137 f"{self.__class__.__name__} could not find attribute {item} -- did you "
138 f"forget to register node package to this key?"
139 ) from e
AttributeError: Creator could not find attribute atomistic -- did you forget to register node package to this key?
Ok, those were easily resolved by the fact that register
now takes the identifier first and then (optionally) the domain shortcut instead of the other way around. We get a little farther in phonopy because by chance the domain shortcut is also actually the name of an importable module!
I seem to have a bug in the registration process though, such that only the first branch of deeply nested stuff is available.
I seem to have a bug in the registration process though, such that only the first branch of deeply nested stuff is available.
Indeed, this is a bug in how the creator's pathing is interacting with the module walk. Things are available, it's just that siblings get nested inside their elder sibling: e.g., for atomistic
, instead of Workflow.create.atomistic.structure.build.bulk()
, I need Workflow.create.atomistic.calculator.engine.property.structure.build.bulk()
.
I'll patch it on main, since it's a real generic bug, and then merge in the fix.
@JNmpi, I've been thinking about this issue, and even though we can't do everything universally, it may still be worthwhile offering wrappers for a few more common data types like dict
and list
. For instance, we could cook up a function that takes one of these objects and dynamically builds a node for it s.t. we can write things like
wf.phonopy = wf.create.atomistic.property.phonons.create_phonopy(
structure=wf.relaxed_structure.outputs.structure,
parameters=wf.create.standard.dict({"distance": displacement}),
engine=wf.engine,
max_workers=max_workers,
)
Where displacement
is a single value node, and wf.create.standard.dict
actually first dynamically defines a new node type that has a single input channel (labeled "distance") and outputs a dictionary like we want, and instantiates this node connecting displacement
to the new node's distance
input.
I have kind of mixed feelings about this, because I worry that it will lull people into just writing parameters={"distance": displacement}
and getting frustrated that it doesn't work. But it may be worth it.
Definitely we can do something similar for data classes, like
wf.phonopy = wf.create.atomistic.property.phonons.create_phonopy(
structure=wf.relaxed_structure.outputs.structure,
parameters=wf.create.atomistic.property.phonons.InputPhonopyGenerateSupercellsNode(
distance=displacement
),
engine=wf.engine,
max_workers=max_workers,
)
Where InputPhonopyGenerateSupercellsNode
is returns a single value of type InputPhonopyGenerateSupercells
, and as long as users have generated a data class InputPhonopyGenerateSupercells
, making the node is trivially easy, like:
InputPhonopyGenerateSupercellsNode = Workflow.wrap_as.dataclass_node(InputPhonopyGenerateSupercells)
Again, I have some concerns here. Mostly I just really like the purity of two core node classes (Function
and Macro
) and get a bit nervous of slippery slopes to too many characters hanging around, but I'm willing to try it.
ImportError: cannot import name 'FileDataTemplate' from 'pyiron_base' (/usr/share/miniconda3/envs/my-env/lib/python3.11/site-packages/pyiron_base/__init__.py)
Notebook errors are some upstream import thing. I'll check on the versioning tomorrow.
Phonopy notebook failure:
Exception encountered at "In [36]":
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Cell In[36], line 1
----> 1 from pyiron_workflow.node_library.dev_tools import Output
Indeed you can't import this. I didn't see it because locally I had to stop at the matgl
stuff. I went through and modified the rest of the book so it should run fine now.
ImportError: cannot import name 'FileDataTemplate' from 'pyiron_base' (/usr/share/miniconda3/envs/my-env/lib/python3.11/site-packages/pyiron_base/__init__.py)
Notebook errors are some upstream import thing. I'll check on the versioning tomorrow.
This is fixed in the pyiron_contrib:main
, we just need a new release. Since that package will want to update base and atomistics dependencies anyhow, I'll briefly wait.
Coverage variation | Diff coverage |
---|---|
Report missing for d6c57a9ada47d3e3177d9c03ad9cd8bfffd3773b[^1] | :white_check_mark: 73.26% |
You may notice some variations in coverage metrics with the latest Coverage engine update. For more details, visit the documentation [^1]: Codacy didn't receive coverage data for the commit, or there was an error processing the received data. Check your integration for errors and validate that your coverage setup is correct.
Both notebooks now hit the import error. Wait for updated versions, black format, and it should be done.
The "rerun failed jobs" button for the CI report and "rerun job" for individual jobs are both doing nothing, so I am going to close and reopen...
In notebooks/phonopy_wf.ipynb
---------------------------------------------------------------------------
Exception encountered at "In [88]":
---------------------------------------------------------------------------
ReadinessError Traceback (most recent call last)
Cell In[88], line 1
----> 1 df = energy_at_volume(element='Fe').iter(strain=np.linspace(-0.2, 0.2, 11))
2 df.plot(x='strain', ylabel='Energy (eV)', title='Energy-Volume Curve');
...
ReadinessError: Collect received a run command but is not ready. The node should be neither running nor failed, and all input values should conform to type hints.
Collect readiness: False
STATE:
running: False
failed: False
INPUTS:
out_dump ready: True
out_log ready: True
calc_mode ready: False
bla ready: True
The rest ran ok
The lammps engine node and its children had some minor issues to be worked through, but it was all soluble.
Per the last commit on #33
@JNmpi, up to the
matgl
stuff, the notebooks should run again. The key problem was that data channels can't be used arbitrarily as input to other objects (e.g. dictionaries. Cf. output and input injection below). Here I just did the close to the minimal changes to get the notebooks running again, but I'll keep playing around with this to see how we can reap the benefits of the dataclass attack without hitting this problem.Summary:
**kwargs
is not a valid signature for a function node; I replace this with a specific input that needs to be commensurate with the appropriate dataclassec.json
file gets downloaded as part of the notebook executionmatgl
to install on my local environment, so we'll see if the CI fails at the same placeOutput injection
The result macro has three nodes: a
UserInput
node created automatically by parsing thesome_array
in the signature,my_node
which was explicitly created, and a dynamically created array slicing node. This was pretty easy to implement by overriding output channels to simply create new nodes; at the time of definition, such nodes are always terminal leaves (although we can later incorporate them into the graph just fine), and what input is being passed to them is always well-defined because they're being created from an output channel. So this will work for an arbitrary graph as long asOutputDataChannel
has been overloaded to understand the requested operation, and it happens at declaration time.Input injection
At first glance, this might look analogous to output injection, but it's actually completely different. Remember that
some_array
is actually aUserInput
node instance, so we're creating a dictionary with a node as one of its items. We're allowed to do that, but how shouldSomeNode
interpret this input? We could overloadSomeNode
so that it knows how to interpret both"array_input": array
and"array_input": Node
data, but this is problematic because we still wouldn't have a graph edge between thesome_array
node andmy_node
instances -- in this attack all the processing is done at run-time inside theSomeNode
node function.So what we really want is to inject a new node between
some_array
andmy_node
-- i.e. to replace{"array_input": some_array}
withSingleItemDict(key="array_input", value=some_array)
whereThis would accomplish everything we want: we'd get a solid execution chain from
some_array
tomy_node
, and the data passed around is proper node channels the whole way.The catch is, how is
{"array_input": some_array}
supposed to know it should turn itself into a node? Withsome_output[:3]
it was easy, because we're calling__getitem__
onOutputDataChannel
, so we can just define it how we want. But here, we would need to go in and modifydict.__new__
so that it returns aNode
instead of adict
! Not only would making all these changes be arbitrarily time-consuming (since you need to do it for every object the user might pass data to, including objects they define, which means they need to remember to modify__new__
!) but also it might have unwanted side-effects -- maybe somewhere totally different they actually want to make a dictionary of channels! By overrideingdict.__new__
we'd be making that impossible.I like the direction the dataclasses are going, but unfortunately I think they lend themselves towards this "input injection" anti-pattern. We definitely need some sort of QoL tools here to avoid copy-pasting reused signatures all over the place, and to condense/group channels together to keep appearances tidier, but I'm skeptical that we'll ever be able to just throw a channel into some arbitrary object's
__new__
or__init__
and get it to do what we're hoping.