Convergence workflows - Githubissues

pfebrer commented 4 years ago

I talked to Emanuele and he gave me some advice on how to develop an Aiida workflow. He also gave me permission to try and develop convergence workflows :)

Right now it is basic stuff, but it works.

There is a base ConvergenceWorkflow and then workflows that inherit from it that are more specific (i.e. MeshCutoffConvergence and KpointComponentConvergence). Finally, there is a KpointsConvergenceWorkflow that iterates over k point components to converge them.

Let me know what would you say could be improved.

Some main things that I have thought about are:

In ConvergenceWorkflow, have the ability of running more than one job per iteration.
Let the convergence criterium be flexible. Let the user pass a function to determine when it has converged. Also have some preset functions for convergence, then the input would be just a string indicating the one that you want to use.
In KpointsConvergence, converge multiple components in parallel.
Implement the EnergyShiftConvergence workflow.
One final workflow that converges everything (kpoints, mesh cutoff and energy shift).

EDIT: As Emanuele said when we discussed about this, given the few cases of parameters that should be converged, it was probably not worth it to generalize that much just for that.

Therefore, the fix has been generalizing even more to create Iterator workflows :). Then each convergence workchain is the combination of BaseConvergencePlugin and a Iterator. (I wanted to define them dinamically, but then saw that aiida doesn't like that).

I saw that SiestaBaseWorkChain already involves some kind of iteration, but it seemed to me that the iteration is only meant to restart the same simulation, not to run multiple simulations. Please correct me if I'm wrong.

It may seem like an overkill to generalize as much as I did, but it seems quite powerful to me, I don't know. Now someone with little knowledge of Aiida will be able to launch multiple simulations iterating over a parameter easily. Note that with the AttributeIterator one can run the same simulation over different pseudos, pseudo_families, codes (i.e. computers), structures, slurm options, etc. without any further code. This wide range of applications with just a single workflow seems worth it to me.

You will see that there are classes that iterate over specific parameters (i.e. the MeshCutoffIterator). An exactly equivalent class can be obtained by passing the appropiate settings, and that's why there are some dictionaries (_PARAMS, _BASIS_PARAMS, _ATTRIBUTES), that define how to get them. I don't know which one of the two ways is better, but I thought there's already probably something like those dictionaries, if not in aiida_siesta, in SIESTA itself. Is it the case? With the help of this dictionaries, one can use the high level APIs iterate and converge:

from aiida_siesta.workflows.converge import converge
from aiida.orm import Str, Code, StructureData, Dict

siesta = Code.get_from_string('siesta@mare')
converge('meshcutoff', code=siesta, structure=StructureData(),
    parameters=Dict(), pseudo_family=Str('Whatever'),
    options=Dict(dict={'resources': {'num_machines': 1, "num_mpiprocs_per_machine": 1}, 'max_wallclock_seconds': 3600}))

Giving the possibility to run batches of simulations instead of doing it one by one would still be cool I think. Also, since the values for the parameters are provided by a generator, I would say it would not be very complicated to adapt this to implement "zipped" iterators or the cartesian product between two (or more) variables that need to be iterated, which I believe would be very useful for high-throughput, and for code testing.

PS: Sorry for the long text :)

bosonie commented 4 years ago

I didn't see the recent changes, I'll try to read time later. Ignore requests above if they are not pertinent anymore.

pfebrer commented 4 years ago

I've added an example, let me know if it works fine for you :)

Also, I've run the things to clean style issues, but there are some things that the prospector is marking as wrong that are unfixable:

too many ancestors on classes.
The BaseConvergencePlugin, since it does not inherit from any WorkChain, it's just a plugin, complains about not having self.ctx, self.inputs, etc... Maybe I should not make it a class? It seemed convenient to me to do it like this to facilitate defining further classes, but maybe I can have a with_convergence class builder that recieves an Iterator and sets all the "plugins".
It complains about some camel casing in function arguments, but those variables are expected to be classes only. Does it make sense?

By the way, this afternoon I will implement the kpoints convergence as you said :+1:

pfebrer commented 4 years ago

What would be the best way to expose the outputs of all iterations? Does it make sense to provide a post processing function that receives a node ID and collects the results?

I.e..:

def collect_results(node, ...other options):

    for simulation in load_node(node).called:
        ...

bosonie commented 4 years ago

It depends on what you want to do. I'm not sure I understand, is this inside a Workchain? I will assume yes. First, it doesn't make sense to expose all the outputs of all the iterations if they are called in a WorkChain. There is the provenance that allows you to explore everything anyway, why would you want to return them also as outputs node of a high level WorkChain? Therefore I assume you want to return as output of a high level Workchain some selected outputs of some called processes, is it right? In that case, you can't use your approach in a @calcfunction, meaning you can't use this way to return new outputs nodes. A @calcfunction accepts only data nodes (inputs, outputs of WorkChain or CalcJob), not process nodes (WorkChain and CalcJob them-self). You could use a normal function to select some already created inputs/outputs nodes of a process node (exacly like in your example) and then you pass them to a @calcfunction in order to return a new output that collects what you want. This output will be output of the high level Workchain. To understand better: https://aiida-core.readthedocs.io/en/stable/reference/index.html

pfebrer commented 4 years ago

I don't want to create new nodes for the outputs of my IteratorWorkchain, I was just wondering if you could simply create a link to the already existing outputs. Like this:

Workchain node:

CALLS:

BaseSiestaWorkchain --> outputs: node 1
BaseSiestaWorkchain --> outputs: node 2

OUTPUTS:

node 1
node 2

Anyway, I understand that it is not necessary. Then is it ok to provide functions that explore the calls to make processing easier for the user (outside the Workchain)? They could still be bound to each specific WorkChain. For example (if staticmethods are allowed in workchains):

class MyWorkchain(WorkChain):
    ... methods that help running the workchain...

    @staticmethod
    def get_workchain_log(node, *args, **kwargs):
         '''
        This function would implement meaningful ways to process all the nodes generated
        by a workchain run.
        '''

# And then the user goes like
process = submit(MyWorkChain, **inputs)

# When the process is done
MyWorkChain.get_workchain_log(process)

get_workchain_log is very general, but methods could be very specific to the particular workchain. I believe it would be very handy for users, instead of always trying to process the generated tree themselves, which can be time-consuming if each workchain produces a different tree.

What do you think?

pfebrer commented 4 years ago

Oh I understand now, that what you are saying is that you could do something like this, but get_workchain_log should contain a @calcfunction, right?

bosonie commented 4 years ago

Ah, now I understand. So basically you want simply to create a function that helps you to post-process your results, but not inside the Workchain. Not sure why you want to do it, but fine, I still didn't find time time to read your code properly. This is a completely different task, to attach this function to the Workchain as a static method is fine, but you would never keep track of what you do in the database. If you want to create a node in the database you can use a @calcfunction in your get_workchain_log but then we go back to the original question: if your post-process is always useful, why don't you include it as last step of MyWorkChain? The WorkChains have been created exactly with this purpose, to avoid that you submit manually the call to MyWorkChain.get_workchain_log(process), but you run it automatically and return the results of MyWorkChain.get_workchain_log(process) in an output node.

pfebrer commented 4 years ago

Well, I was thinking in not having a single way to post process it, but multiple. But then it is probably fine to include an input that indicates how to post process it.

Still, I think it's a nice feature that these same functions that run during the workchain are available also afterwards. You don't always know what you want to get from your simulations beforehand, do you? Also, I think that the post processing step usually needs to be more flexible than what aiida allows with their rigid datatypes and most of the times it's an "experimenting" process, which doesn't go well with having to keep track of everything you do.

bosonie commented 4 years ago

No, we don't want any experimenting process and we know exactly what we want. We want to distribute in the aiida_siesta package a tool that, with a single submission, is able to tell the user what mesh-cutoff, kpont-mesh etc. is appropriate for the system under investigation. This is what I had in mind. The workflows we offer are "turn-key" solutions for specific common problems. Then, of course, aiida offers a platform to experiment as much as you want in your personal environment, but this is different respect to what we can include in the package.

pfebrer commented 4 years ago

Ok, thanks :+1:

But then it is probably fine to include an input that indicates how to post process it.

Then is this fine? (I'm not saying it for the convergence workchains, which have a clear output, I'm asking in general).

bosonie commented 4 years ago

Do you mean an input that selects one of the available post-process?? Yes, that's perfectly fine, always better to have a default if possible , but even without it, it's ok. I'm not sure I can review the code even today, but I promise tomorrow! Very sorry about it!!! And thanks for the amazing amount of work! Hope you are having fun!

pfebrer commented 4 years ago

By the way, I understand that the main aim of aiida_siesta and aiida in general is to tell the user WHAT is the answer to their complicated question. However, I don't see harm in also helping them understand WHY that is the answer (instead of saying, you can explore it, but do it yourself). Given that provenance is kept so strictly, I see a great opportunity to do that and I don't see one thing interfering with the other.

Sorry for all the questions, I'm just curious about it :)

pfebrer commented 4 years ago

Do you mean an input that selects one of the available post-process?? Yes, that's perfectly fine, always better to have a default if possible , but even without it, it's ok.

Great, thanks!

I'm not sure I can review the code even today, but I promise tomorrow! Very sorry about it!!! And thanks for the amazing amount of work! Hope you are having fun!

No worries!

bosonie commented 4 years ago

The aim of aiida is to give a platform for high-throughput calculations, the aim of aiida_siesta is to interface siesta with aiida, therefore enable high-throughput with siesta. The task is fully achieved without us providing any WorkChain (except maybe the BaseSiestaWorkChain). The WorkChains we include have two purposes: 1) to be educational (they are examples at which one can look to understand how to build a workchain) 2) to provide useful tools that can save time to siesta users, basically saving you from boring tasks that a machine can do automatically

Moreover there is the documentation that should explain you everything. Did I answer your question? In what other way you think of "helping the users understand WHY that is the answer"?

pfebrer commented 4 years ago

The task is fully achieved without us providing any WorkChain (except maybe the BaseSiestaWorkChain)

Oh, I didn't know this. With this are you saying that the workchains should always be developed by the users, or that aiida should develop workchains that siesta users will be able to use thanks to aiida_siesta? If it's the first one I don't understand why can't you provide workchains to users (probably 10s of users will be developing the same code instead of building on top of what is already done, which would lead to highly complex workflows using reusable, optimized, and tested pieces of code, making it look simple).

Moreover there is the documentation that should explain you everything. Did I answer your question? In what other way you think of "helping the users understand WHY that is the answer"

Well, I was thinking how I could use aiida to run lots of simulations without needing to worry about how to keep track of things. But maybe I don't want a specific answer. I just want to run a bunch of simulations and then explore them. I would like that I could collect all the data in an easy way to analyze it. Maybe get it directly as a pandas dataframe or an xarray dataarray. I don't think this is something so specific that each user should need to think it from scratch, is it? Or this is not meaningful for the aim of aiida?

pfebrer commented 4 years ago

I'm starting to understand everything :) There's an aiida_optimize package. Then I should use that package in conjuntion with BaseSiestaWorkChain to optimize a parameter in SIESTA. Is it right?

And if there's something that I thing it's missing I should develop a plugin called aiida_whatever to make it general, there's no point in putting it inside aiida_siesta

bosonie commented 4 years ago

We are moving to almost philosophical questions, and a bit off topic, I will write you a private email. Let's keep here just for comments on the code.

pfebrer commented 4 years ago

Hey Emanuele, I've changed everything trying to incorporate your feedback. Now there is a main SiestaIterator that decides how to act according to the parameter that needs to be iterated. Also, the input values are always normalized to node pks, as we discussed, so we can iterate over everything :).

I also incorporated the possibility to iterate over multiple parameters by zipping them or taking the cartesian product. This is controlled by using the "iterate_mode" input.

I added A LOT of documentation in the iterate file, I hope it can be self-explanatory of how things work. I also added an example for iteration and for mesh cutoff convergence.

Quick snippet of how to use it:

from aiida.engine import run
from aiida_siesta.workflows.iterate import SiestaIterator

inputs = ...define your inputs

res = run(SiestaIterator, **inputs,
    iterate_over={
          'structure': [struct1, struct2, struct3],
          'kpoints_density': [1, 0.5, 0.3]
    }
)

This will run three simulations (struct1, 1), (struct2, 0.5), (struct3, 0.3).

But you can make the iterator run all combinations easily:

from aiida.engine import run
from aiida_siesta.workflows.iterate import SiestaIterator
from aiida.orm import Str, Int

inputs = ...define your inputs

res = run(SiestaIterator, **inputs,
    iterate_over={
          'structure': [struct1, struct2, struct3],
          'kpoints_density': [1, 0.5, 0.3]
    }, iterate_mode=Str('product'), batch_size=Int(3)
)

This will run 9 simulations (struct1, 1), (struct1, 0.5), (struct1, 0.3), (struct2, 1), (struct2, 0.5) ... Also, it will run them 3 at a time due to the batch_size input.

The convergence works exactly the same but using aiida_siesta.workflows.converge.SiestaConverger.

Looking forward to your feedback!

bosonie commented 4 years ago

Thanks @pfebrer96, I'm a bit busy at the moment, but I hope to be able to review by the end of next week! Thanks!!!

pfebrer commented 4 years ago

Great, thanks!

pfebrer commented 4 years ago

I've changed the example to converge the mesh cutoff for an example of how to converge multiple parameters sequentially (with the new SiestaSequentialConverger), because I think it is more useful.

bosonie commented 4 years ago

I don't understand the need of _process_parsing_func, but the rest is more clear. If the quantity we want to iterate over is an input of the workchain, never we will need a _process_parsing_func, right?

Anyway, all good. Just please bring your branch up to date with develop and run yapf -i aiida_siesta/workflows, I will review the pre-commit errors and then we merge. Thanks!!

pfebrer commented 4 years ago

I don't understand the need of _process_parsing_func, but the rest is more clear.

Yeah I'm not sure either, because what you would do there is the same as you would do in the input serializer of _process_class, so maybe it never needs to be used. However, maybe it's useful if you want to allow some extra parsing with the iterator that you don't want to allow in a regular run of the workchain. I don't know, I kept it as a tunable parameter just in case because I didn't see any downside.

I will review the pre-commit errors and then we merge

I still need to fix the input serializer function :)

bosonie commented 4 years ago

Sorry to put you pressure, but I would like to make a release Tuesday or Wednesday where I include this. The AiiDA school is on and there is going to be a presentation claiming that this Converger is in the package! Sorry and thanks!

pfebrer commented 4 years ago

It is ready now, BUT:

When I rebased to the develop branch, I got an error running the SiestaBaseWorkChain. I believe it has nothing to do with my implementation. Here is the traceback:

07/06/2020 01:36:13 AM <23437> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [19167|SiestaBaseWorkChain|on_except]: Traceback (most recent call last):
  File "/home/pfebrer/Simulations/Aiida/aiida_siesta_plugin/aiida_siesta/workflows/utils/protocols.py", line 100, in _protocols_checks
    Group.get(label=famname)
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/aiida/orm/groups.py", line 356, in get
    raise exceptions.NotExistent("No group found matching criteria '{}'".format(kwargs))
aiida.common.exceptions.NotExistent: No group found matching criteria '{'label': 'nc-sr-04_pbe_standard_psml'}'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/plumpy/process_states.py", line 220, in execute
    result = self.run_fn(*self.args, **self.kwargs)
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/aiida/engine/processes/workchains/workchain.py", line 177, in _do_step
    finished, stepper_result = self._stepper.step()
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/plumpy/workchains.py", line 283, in step
    finished, result = self._child_stepper.step()
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/plumpy/workchains.py", line 515, in step
    finished, result = self._child_stepper.step()
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/plumpy/workchains.py", line 283, in step
    finished, result = self._child_stepper.step()
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/plumpy/workchains.py", line 234, in step
    return True, self._fn(self._workchain)
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/aiida/engine/processes/workchains/restart.py", line 205, in inspect_process
    for handler in sorted(self.get_process_handlers(), key=lambda handler: handler.priority, reverse=True):
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/aiida/engine/processes/workchains/restart.py", line 314, in get_process_handlers
    return [method[1] for method in getmembers(cls) if cls.is_process_handler(method[1])]
  File "/usr/lib/python3.6/inspect.py", line 342, in getmembers
    value = getattr(object, key)
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/aiida/common/lang.py", line 88, in __get__
    return self.getter(owner)
  File "/home/pfebrer/Simulations/Aiida/aiida_siesta_plugin/aiida_siesta/workflows/base.py", line 242, in inputs_generator
    return BaseWorkChainInputsGenerator(cls)
  File "/home/pfebrer/Simulations/Aiida/aiida_siesta_plugin/aiida_siesta/workflows/utils/generator_metaclass.py", line 26, in __init__
    super().__init__()
  File "/home/pfebrer/Simulations/Aiida/aiida_siesta_plugin/aiida_siesta/workflows/utils/protocols.py", line 60, in __init__
    self._protocols_checks()
  File "/home/pfebrer/Simulations/Aiida/aiida_siesta_plugin/aiida_siesta/workflows/utils/protocols.py", line 104, in _protocols_checks
    'but no family with this name is loaded in the database'.format(k, famname)
  File "/home/pfebrer/Simulations/Aiida/aiida_siesta_plugin/aiida_siesta/workflows/utils/protocols.py", line 68, in raise_invalid
    raise RuntimeError('invalid protocol registry `{}`: '.format(self.__class__.__name__) + message)
RuntimeError: invalid protocol registry `BaseWorkChainInputsGenerator`: protocol `standard` requires `pseudo_family` with name nc-sr-04_pbe_standard_psml but no family with this name is loaded in the database

07/06/2020 01:36:13 AM <23437> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [19167|SiestaBaseWorkChain|on_terminated]: remote folders will not be cleaned
07/06/2020 01:36:14 AM <23437> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [19165|SiestaBaseWorkChain|on_except]: Traceback (most recent call last):
  File "/home/pfebrer/Simulations/Aiida/aiida_siesta_plugin/aiida_siesta/workflows/utils/protocols.py", line 100, in _protocols_checks
    Group.get(label=famname)
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/aiida/orm/groups.py", line 356, in get
    raise exceptions.NotExistent("No group found matching criteria '{}'".format(kwargs))
aiida.common.exceptions.NotExistent: No group found matching criteria '{'label': 'nc-sr-04_pbe_standard_psml'}'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/plumpy/process_states.py", line 220, in execute
    result = self.run_fn(*self.args, **self.kwargs)
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/aiida/engine/processes/workchains/workchain.py", line 177, in _do_step
    finished, stepper_result = self._stepper.step()
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/plumpy/workchains.py", line 283, in step
    finished, result = self._child_stepper.step()
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/plumpy/workchains.py", line 515, in step
    finished, result = self._child_stepper.step()
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/plumpy/workchains.py", line 283, in step
    finished, result = self._child_stepper.step()
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/plumpy/workchains.py", line 234, in step
    return True, self._fn(self._workchain)
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/aiida/engine/processes/workchains/restart.py", line 205, in inspect_process
    for handler in sorted(self.get_process_handlers(), key=lambda handler: handler.priority, reverse=True):
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/aiida/engine/processes/workchains/restart.py", line 314, in get_process_handlers
    return [method[1] for method in getmembers(cls) if cls.is_process_handler(method[1])]
  File "/usr/lib/python3.6/inspect.py", line 342, in getmembers
    value = getattr(object, key)
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/aiida/common/lang.py", line 88, in __get__
    return self.getter(owner)
  File "/home/pfebrer/Simulations/Aiida/aiida_siesta_plugin/aiida_siesta/workflows/base.py", line 242, in inputs_generator
    return BaseWorkChainInputsGenerator(cls)
  File "/home/pfebrer/Simulations/Aiida/aiida_siesta_plugin/aiida_siesta/workflows/utils/generator_metaclass.py", line 26, in __init__
    super().__init__()
  File "/home/pfebrer/Simulations/Aiida/aiida_siesta_plugin/aiida_siesta/workflows/utils/protocols.py", line 60, in __init__
    self._protocols_checks()
  File "/home/pfebrer/Simulations/Aiida/aiida_siesta_plugin/aiida_siesta/workflows/utils/protocols.py", line 104, in _protocols_checks
    'but no family with this name is loaded in the database'.format(k, famname)
  File "/home/pfebrer/Simulations/Aiida/aiida_siesta_plugin/aiida_siesta/workflows/utils/protocols.py", line 68, in raise_invalid
    raise RuntimeError('invalid protocol registry `{}`: '.format(self.__class__.__name__) + message)
RuntimeError: invalid protocol registry `BaseWorkChainInputsGenerator`: protocol `standard` requires `pseudo_family` with name nc-sr-04_pbe_standard_psml but no family with this name is loaded in the database

07/06/2020 01:36:14 AM <23437> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [19165|SiestaBaseWorkChain|on_terminated]: remote folders will not be cleaned
07/06/2020 01:36:15 AM <23437> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [19163|SiestaConverger|on_except]: Traceback (most recent call last):
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/plumpy/process_states.py", line 220, in execute
    result = self.run_fn(*self.args, **self.kwargs)
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/aiida/engine/processes/workchains/workchain.py", line 177, in _do_step
    finished, stepper_result = self._stepper.step()
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/plumpy/workchains.py", line 283, in step
    finished, result = self._child_stepper.step()
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/plumpy/workchains.py", line 515, in step
    finished, result = self._child_stepper.step()
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/plumpy/workchains.py", line 283, in step
    finished, result = self._child_stepper.step()
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/plumpy/workchains.py", line 234, in step
    return True, self._fn(self._workchain)
  File "/home/pfebrer/Simulations/Aiida/aiida_siesta_plugin/aiida_siesta/workflows/iterate.py", line 342, in analyze_batch
    self.analyze_process(self.ctx[process_id])
  File "/home/pfebrer/Simulations/Aiida/aiida_siesta_plugin/aiida_siesta/workflows/converge.py", line 123, in analyze_process
    simulation_outputs = results.output_parameters.get_dict()
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/aiida/orm/utils/managers.py", line 82, in __getattr__
    return self._get_node_by_link_label(label=name)
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/aiida/orm/utils/managers.py", line 63, in _get_node_by_link_label
    return self._node.get_outgoing(link_type=self._link_type).get_node_by_label(label)
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/aiida/orm/utils/links.py", line 302, in get_node_by_label
    raise exceptions.NotExistent('no neighbor with the label {} found'.format(label))
aiida.common.exceptions.NotExistent: no neighbor with the label output_parameters found

07/06/2020 01:36:15 AM <23437> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [19162|SiestaSequentialConverger|on_except]: Traceback (most recent call last):
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/plumpy/process_states.py", line 220, in execute
    result = self.run_fn(*self.args, **self.kwargs)
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/aiida/engine/processes/workchains/workchain.py", line 177, in _do_step
    finished, stepper_result = self._stepper.step()
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/plumpy/workchains.py", line 283, in step
    finished, result = self._child_stepper.step()
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/plumpy/workchains.py", line 515, in step
    finished, result = self._child_stepper.step()
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/plumpy/workchains.py", line 283, in step
    finished, result = self._child_stepper.step()
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/plumpy/workchains.py", line 234, in step
    return True, self._fn(self._workchain)
  File "/home/pfebrer/Simulations/Aiida/aiida_siesta_plugin/aiida_siesta/workflows/iterate.py", line 342, in analyze_batch
    self.analyze_process(self.ctx[process_id])
  File "/home/pfebrer/Simulations/Aiida/aiida_siesta_plugin/aiida_siesta/workflows/converge.py", line 250, in analyze_process
    if process_node.outputs.converged:
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/aiida/orm/utils/managers.py", line 82, in __getattr__
    return self._get_node_by_link_label(label=name)
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/aiida/orm/utils/managers.py", line 63, in _get_node_by_link_label
    return self._node.get_outgoing(link_type=self._link_type).get_node_by_label(label)
  File "/home/pfebrer/.venvs/aiida/lib/python3.6/site-packages/aiida/orm/utils/links.py", line 302, in get_node_by_label
    raise exceptions.NotExistent('no neighbor with the label {} found'.format(label))
aiida.common.exceptions.NotExistent: no neighbor with the label converged found

bosonie commented 4 years ago

I'm so glad you found this bug. Now it should be fixed! Thanks!!!

pfebrer commented 4 years ago

Yes, all good now :+1: In principle it is ready now. Although some things could be improven, if you want to make a release tomorrow it is probably better to leave it like this and improve it later.

bosonie commented 4 years ago

There is always space for improvements, surely will keep learning and changing, but it is definitely a great starting point!

Looking at the pre-commit error, only one more comment comes to me. I think that BaseConvergencePlugin could become a InputsConverger and be subclass of InputIterator. Probably it didn't make much sense in your original implementation, but now it does! The converger needs InputIterator entirely and rewrites few methods.

Then BaseIteratorWorkChain, InputIterator, InputsConverger and SequentialConverger are metaclasses that can be moved to the \utils folder (end eventually in aiida_core one day) and in the \workflows folder we leave just one file called iterate_converge.py where we put the SiestaIterator class with the various SIESTA_ITERATION_PARAMS dict and the external functions set_up_kpoint_grid ... and the SiestaConverger and SiestaSequentialConverger. To define the SiestaConverger we could think about the same logic of SiestaSequentialConverger, meaning only needing _process_class = SiestaIterator (to this aim, modifications in InputsConverger are needed), but for the moment we could just copy what is in SiestaIterator in SiestaConverger.

I leave you one night to think about this suggestion. If you think is a brilliant idea, we can do it tomorrow. Otherwise we leave for further discussions.

pfebrer commented 4 years ago

I liked the idea of different convergence algorithms being plugins so that you could take an iterator (e.g. the SiestaIterator) and "plug" any convergence algorithm.

I see your point, but I think the BaseConvergencePlugin does not depend on InputIterator, it just depends on BaseIteratorWorkChain, unless I'm missing something. The only problem I see is these errors that pre-commit finds because obviously it does not know where the class is going to be used.

I already thought about defining the class as:

class BaseConverger(BaseIteratorWorkchain):
      ...

But then I found it weird that the siesta converger was defined like:

class SiestaConverger(BaseConverger, SiestaIterator):
     pass

because in my mind it was wierd that it is inheriting from two iterators. I think it's not only that I find it weird, there would be actual problems of method resolution order with this because of BaseIteratorWorkChain's methods having priority against SiestaIterator.

The approach of needing to define the siesta things each time, I don't like it because now we need to define it just twice, but what if new iterator plugins appear?

I think it would be quite clean if the BaseIteratorWorkchain had a _plugins class attribute that would be a list of stacked plugins that you want to use. You can even have multiple of them working at the same time if you make sure that they don't interfere with each other.

SiestaConverger would be then:

class SiestaConverger(SiestaIterator):
    _plugins = (BaseConvergencePlugin,)

I don't know, this approach seems quite nice to me, since each plugin would return its own outputs based on the same calculations. I have to think more about this to understand if it would have any use in practice, but I don't see any harm in defining it in this way for now even if there is just one "plugin". What do you think?

Then, to solve the pre-commit issues, we could just make the plugins inherit from WorkChain, although it doesn't make sense to me. Actually, a new class would be useful here to register all the plugins (and be able to request them from the iterator in the future!).

AVAILABLE_ITERATOR_PLUGINS = []
class IteratorPlugin(WorkChain):
       def __init_subclass__(cls):
             AVAILABLE_ITERATOR_PLUGINS.append(cls)

Then BaseIteratorWorkChain, InputIterator, InputsConverger and SequentialConverger are metaclasses that can be moved to the \utils folder (end eventually in aiida_core one day) and in the \workflows folder we leave just one file called iterate_converge.py where we put the SiestaIterator class with the various SIESTA_ITERATION_PARAMS dict and the external functions set_up_kpoint_grid ... and the SiestaConverger and SiestaSequentialConverger.

I agree with this. Just one last thing, do you think that a SequentialIterator would make sense? I thought about this but I was not sure, that's why I created directly the SequentialConverger. If it made sense, I can definetly

bosonie commented 4 years ago

There is no doubt though that the Converger is an iterator.

I guess that the solution lays in the ability to completely decouple the methods touched by Convergers and the one touched by <Something>Iterator. In this case:

class SiestaConverger(BaseConverger, SiestaIterator):
     pass

shouldn't have any problem. This was my initial idea but for something very specific. Then you started to generalize a lot and it become more complicated.

Also this proposal:

class SiestaConverger(SiestaIterator):
   _plugins = (BaseConvergencePlugin,)

finds my approval, but will keep the error of the pre-commits. These pre-commits are a bit annoying sometimes but they let you think if there are more "pythonic" ways to deal with problems, not bad.

Anyway, the conclusion is that there is no easy solution. Therefore for the moment I give you a break and I will keep like it is. I will find a way to hide the pre-commits error, but I won't forget about it. We will come back to this maybe next week in another PR.

Thanks a million for everything!!!

pfebrer commented 4 years ago

In this case:

class SiestaConverger(BaseConverger, SiestaIterator): pass shouldn't have any problem.

You are right! I thought the method resolution order here would be BaseConverger -> BaseIterator -> SiestaIterator but it is in fact BaseConverger -> SiestaIterator -> BaseIterator. So this is the most straightforward way of avoiding the pre-commit problems for now. If you want, you can change it. In the long term, however, I think the plugin system will be better, because in that way BaseConvergerPlugin does not need to call super() at the end of its methods. This would be handled gracefully by the BaseIteratorWorkChain, which would also create different dictionaries in ctx so that multiple plugins don't interfere with each other :).

By the way, in the future I believe it would be nice that the inputs of the process were in a separate namespace. In that way, we would allow arbitrarily nested Iterators. (Not for now, because I already saw that you show it in the presentation! :) ) The SequentialConverger is already a nested iterator itself, for example, but if we were to do this, one could do:

class MyIterator(InputIterator):
    _process_class = SiestaSequentialConverger

and then use it to converge multiple structures. Although it is a bit tricky, because it should be:

run(MyIterator,
    iterate_over = {
         # This is the tricky part. We want to iterate over a nested parameter, but it's doable.
         "converger_inputs.structure": [struct1, struct2]
    }, 
    # This is the new namespace for process inputs, where we define
    # the fixed inputs for SiestaSequentialConverger
    process_inputs = {
          "iterate_over": [{
                "meshcutoff" : [100, 200, 300, 400]
          }, {
                "kpoints_0": [1, 2, 3, 4, 5, 6]
          }]
    }
)

Just a note for the future :)

siesta-project / aiida_siesta_plugin

Convergence workflows #50