siesta-project / aiida_siesta_plugin

Source code for the AiiDA-Siesta package (plugin and workflows). See wiki
Other
6 stars 11 forks source link

Iterator reorganization #60

Closed bosonie closed 3 years ago

bosonie commented 3 years ago

Reorganization and some improvements of the iterator and convergence workchains.

This PR, for the moment, consists in the creation of new files. However future_iterate.py and future_converge.py will substitute iterate.py and converge.py when we agree on the reorganization.

The most relevant changes are:

  1. The separation between BaseIterator and InputsIterator have been removed. Now there are two abstract classes: ProcessInputsIterator and GeneralIterator. The reasons for this change are several. First of all, we agreed on the fact that there is no other use case for BaseIterator a part from InputsIterator. In fact you always somehow iterate over inputs of a Process. Secondly, in the spirit of suggesting to include these classes in aiida_core, I thought it was good to separate the actual iterator mechanism from the implementation that allows to decide which parameters (to iterate over) are allowed. In other words, there is no strict need to separate ProcessInputsIterator and GeneralIterator. The GeneralIterator alone would be enough for any use case. However the separation helps the documentation of the classes and will facilitate the discussion. I felt, in fact, that the approach of using _parameters_lookup keyword to decide the allowed parameters should be discussed separately compared to the actual iteration mechanism.

  2. The method process_input_parsing_func (once input_key_and_parse_func) is now a classmethod. This will provide the users with a tool to check in advance which parameters to iterate over are allowed and how. Moreover it is really useful for the sequential converger as it overcomes the previous need to copy _parameter_lookup of _process_class into an attribute of the sequential converger. In general I think the process_input_parsing_func as the central quantity to manage the supported iteration parameters.

  3. I (hopefully) improved a bit the management of situations where a user wants to define a namespace. This is a very common use case. One thing we have be careful about though. In my implementation, the keys of iterate_over that are recognized as process inputs (don't need a parsing function) are specified with no namespace. To clarify, the correct submission is this, where in iterate_over we don't put siesta.parameter as key:

    Class MyIter(GeneralIterator):
                      _expose_inputs_kwarg = {"namestace": "siesta"}
    inputs = { "siesta" : {"parameters" : ..., "basis" : ..., ...}}
    submit(MyIter, **inputs, iterate_over = {"parameters": [...,...,]})
  4. Implemented some checks on _process_class that essentially define the class as abstract. In the new implementation I don't have any abstract method, but only abstract attributes. The way to manage abstract attributes is not clear to me and maybe not even in the community. See here: https://stackoverflow.com/questions/2736255/abstract-attributes-in-python. I just followed what is implemented in aiida_core for the BaseRestartWorkChain.

  5. Changed some names (hope they are more clear), made private all the methods that are not classmethods or in the outline, improved the documentation.

This is my proposal for the implementation of our iterator and converger, @pfebrer96 I look forward for your comments. We can revert to your previous implementation anything that you don't find improved. I also ask your help to clearly document the use of _parameters_lookup (see docstring of GeneralIterator). I didn't do much to transform the Converger in an actual plugin. On that I think the AiiDA community can give us a great help. I'm happy I spent time to dig deep in your implementation, this is the most important point. Then, as I said, we can discuss what to bring back to your implementation. In general, if we decide to make a pull request to aiida_core. We will surely receive some useful comments and help from the community.

pfebrer commented 3 years ago

In other words, there is no strict need to separate ProcessInputsIterator and GeneralIterator. The GeneralIterator alone would be enough for any use case. However the separation helps the documentation of the classes and will facilitate the discussion. I felt, in fact, that the approach of using _parameters_lookup keyword to decide the allowed parameters should be discussed separately compared to the actual iteration mechanism.

I feel like there is no need to create two different classes for this. The ProcessInputsIterator is just a GeneralIterator with an empty _params_lookup. Even by just seeing the names, one would tell that GeneralIterator is the parent class and ProcessInputsIterator is a child class that makes the functionality more specific. So I would say that if you want to make two separate classes, ProcessInputsIterator should inherit from GeneralIterator and force that _params_lookup is always empty.

But as I said, I think this adds complexity and uncertainty for a developer that wants to use it, as there could be just one class called BaseIterator, whose default is anyway to be a ProcessInputsIterator. I don't see the advantage of having two classes, could you try to convince me? :)

Note:

class MyIterator(ProcessInputsIterator):
      pass

restricts any class that inherits from MyIterator to only iterate over inputs. (unnecessarily, in my opinion)

bosonie commented 3 years ago

I completely agree with you, in aiida_siesta there won't be a separation. However, as I already said, I decided to do like that to help the understanding for people in aiida_core and separate the discussion on how to manage supported iteration parameters. Your idea of the _params_lookup is very cool, but there are millions more ways to reach the same scope. I just hope this structure will help the understanding...the _params_lookup adds a layer of complexity that must be discussed separately. I couldn't myself exactly understand what _params_lookup requires or can have as optional. For this I left to you the description in the docstring.

pfebrer commented 3 years ago

But even if you want to make the separation, I think GeneralIterator is the generalization, and ProcessInputsIterator is a specific use of it. So the inheritance should go the other way around.

In my opinion, GeneralIterator should not impose that _params_lookup is not empty. It is general and therefore it should be possible to use for any case. It should be ProcessInputsIterator that inherits from GeneralIterator and forces _params_lookup to be empty.

I can write a good explanation of how _params_lookup (or a different approach that you prefer) works, but there's one thing that is simple enough: by default, you can only iterate over inputs of the process.

If one creates workchains inheriting from GeneralIterator (and remove the restriction on _params_lookup), the available parameters can always be changed in higher level classes:

class MyIterator(GeneralIterator):
    _process_class = ...
    pass

class My2ndIterator(MyIterator):
    _params_lookup = (
        ("Cool params", {...}),
    )

class My3rdIterator(My2ndIterator):
    _params_lookup = ()

this is not possible if you make the first one inherit from ProcessInputsIterator because you think no one will never need other parameters and you have a GeneralIterator that imposes restrictions.

bosonie commented 3 years ago

Ok ok, I will make the change and leave just the general iterator. I will leave to you the duty to write properly the documentation. All the rest is good?? After the modifications I would like to merge.

pfebrer commented 3 years ago

For the little that I've seen of aiida, it seems like GeneralIterator should be called BaseIterator or BaseIteratorWorkchain, or anything with the word Base for consistency with the rest of the naming. What's your feeling on this?

bosonie commented 3 years ago

Yes, I was actually wondering. I would go for BaseIterator for the moment. I would instead leave BasicConverger as it is. I imagine there can be more sophisticated converges where the convergence check is done differently.

pfebrer commented 3 years ago

Oh and also, I think one last thing to do if we are planning to propose this for aiida core is to make sure that the parameters of the iterator can be modified from within the class in an easy and simple way.

For example, imagine you create an equation of state workflow like the one you have:

class EOS(Iterator):
    ...

In this case, iterate_over should not be required, since the workchain already has some defaults for the scales parameter, which is the only one that you need. And somehow, the workchain should be able to inject scales into the iterator (I'm sure the correct place to do it is in _parse_iterate_over).

Can I give a go at converting the EOS workflow using the iterator? This would be the ultimate proof that the iterator is actually useful for developing (it takes away the boilerplate code to iterate and create batches in a consistent way, and since the features are centralized, an improvement to the BaseIterator would be reflected on all its childs).

bosonie commented 3 years ago

Shouldn't it be already possible without modifying the BaseIterator??? you just have to define in _params_lookup the support for the "scale", no?? Yes you can of course, maybe in another PR. Unfortunately now I have to go out. I'll merge it tonight after copying the future_ into iterate.py and converge.py. Ok??

pfebrer commented 3 years ago

Shouldn't it be already possible without modifying the BaseIterator??? you just have to define in params_lookup the support for the "scale", no??

Yes, you can do that, but currently there's no way of saying: if no scales are provided in iterate_over, inject these default scales.

pfebrer commented 3 years ago

Unfortunately now I have to go out. I'll merge it tonight after copying the future into iterate.py and converge.py. Ok??

I won't be able to write the documentation until tomorrow, can it wait? :)

bosonie commented 3 years ago

Ok. Listen, I'll finish this and merge. You can do the documentation in another branch starting from develop. For the equation of state, rewrite the current equation of state in another PR. Thanks!

pfebrer commented 3 years ago

:+1: