Closed pfebrer closed 4 years ago
Can you please push as well an example_eos.py
so I can better see your API? Thanks! Don't worry about tests and so on. I'll look at it.
I mean, I guess we now have the possibility to select the scale parameters in input?
I'm out of home now, I will push it this afternoon. But the old exemple should run fine, it works exactly the same. It's just that if you want you can use the "scales" input to customize the scales (which have a default) , and you have also the batch_size input available.
On Fri, Aug 21, 2020, 14:46 Emanuele Bosoni notifications@github.com wrote:
Can you please push as well an example_eos.py so I can better see your API? Thanks! Don't worry about tests and so on. I'll look at it.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/albgar/aiida_siesta_plugin/pull/63#issuecomment-678271832, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKA77ZNUGBAN7YQHT67XDF3SBZUC3ANCNFSM4QHHL3TA .
Cool, I understand, no need then to push another example. I'll play around myself
Ok, so basically the point here is that in our iterator implementation, the input iterate_over
was always required and here the iteration parameter is selected by the user. However there are situation where:
iterate_over
dictionary but passing the list directly as an input portJust one observation. I guess this approach has the danger to make confusion if we want to iterate over an input port of _process_class
. I mean, if iteration_input(name)
where name
is also an exposed input of _process_class
, this would result in two inputs port with the same name. The developer should be forced to exclude it from the exposed inputs of _process_class
. Rest is good!
Regarding the scale_to_vol
, there is an easy fix. Kust substitute get_pymatgen
with get_pymatgen_structure
. Can you do it for me and reinstate the call to the function? Thanks!!!
I mean, if iteration_input(name) where name is also an exposed input of _process_class, this would result in two inputs port with the same name. The developer should be forced to exclude it from the exposed inputs of _process_class.
So maybe raise an error if you are trying to create an iteration_input with a name that already exists? Isn't this already managed by aiida? I don't know, I'm asking, but it seems odd that a workchain lets you specify two inputs with the same name.
I thought about making all the inputs of _process_class
iterable, in the sense that if a python list is passed for an input, it will be interpreted as an attempt to iterate over it, but I'm not sure if that could interfere with some functionality. What do you think? To be clear, I'm saying something like:
run = (SiestaIterator, structure=[struct1, struct2])
would automatically interpret that you want to iterate over the structure. Maybe it could be optional and be off by default, but it could be useful to prevent nesting and it seems to me that not many aiida inputs will accept python lists anyway so it's safe to do so.
Another approach could be to declare explicitly which are the inputs you are trying to iterate over, but I don't know. Something like:
run(SiestaIterator, structures=[struct1, struct2], iterate_over=["structures"])
Thoughts?
So maybe raise an error if you are trying to create an iteration_input with a name that already exists? Isn't this already managed by aiida? I don't know, I'm asking, but it seems odd that a workchain lets you specify two inputs with the same name.
I think that AiiDA just reads sequentially and executes the last specification. There is no error. And this is the wanted behavior! In fact, for instance, you want to be able to redefine a port in a subclass. I would say we should not support the possibility to create an iteration_input
with a name that already exists, to rise an error is fine. I guess this would already exclude the case when namespaces are defined. In fact I would allow:
run = (SiestaStructureIterator, structure=[struct1, struct2], siesta_inputs=**inputs)
but not what you suggested:
run = (SiestaIterator, structure=[struct1, struct2], **inputs)
I don't see any need for this.
I think that our original API is perfectly fine! The use of iterate_over
should remain the correct way to submit calculations in my opinion, use of iteration_input
is only an additional possibility for peculiar cases.
As you proposed, we are raising an error now if someone tries to overwrite an input with an iteration_input.
Also, scale_to_vol
is fixed.
During this, I also thought that the SiestaIterator
itself could benefit from these iterations over scales. After thinking for a bit, I realised that any Iterator that accepts a structure as an input can benefit from it. Therefore, I moved the structure modification to utils.iterate_params
(not the definitive location) to somehow start to build a more general concept that allows us to make any iterator be able to benefit from the code that can be shared.
You can see then that when defining SIESTA_ITERATION_PARAMS
I added the structure modification parameters (right before the fdf ones), and then the EOS workchain can use them.
Any suggestion on whether you think this approach is useful or on how to actually implement it would be very welcome (this is just a first draft to show that it can be decoupled).
Also I added the possibility of not creating the iterate_over
input port, by using cls._iterate_over_port = False
. I am using it in the EOS workchain since we actually don't want people to iterate over anything other than structure_scales
(I guess the equation fit only makes sense when everything else is fixed).
I don't understand your decision to make Eos child of SietaIterator. It is useless for Eos to know all the _params_lookup
, they won't be used anyway. The way was implemented before was much better and reusable by other plugin simply by changing _process_class
.
I also understand the need to make a collection of _parsing_func
that can be used across different iterators, but the way you implemented seems to me unnecessary complicated.
Honestly I would revert to the previous commit and ship the functions that modify the structure with the eos, exactly in the same way we ship the _parsing_func
of SiestaIterator in the file containing the class. Then you can create a new PR where you put the iterate_params
file and where we can discuss a way to make a collection of _parsing_func
, divide them between siesta-specific and more general and so on. Ok?
Very well the _iterate_over_port = False
and scale_to_vol
, thanks!
I don't understand your decision to make Eos child of SietaIterator. It is useless for Eos to know all the _params_lookup, they won't be used anyway. The way was implemented before was much better and reusable by other plugin simply by changing _process_class.
Well I was thinking about this for a while, but I thought it was not that easy to reuse the workchain because:
inputs.structure
output_structure
output_parameters
.E_KS
and E_KS_units
from the output parameters dictionary.but yeah I understand that you don't need all the parameters. However they don't interfere at all because you can not pass them. So at the end it is the same to do:
class GenericEOS(BaseIterator):
...
class SiestaEOS(GenericEOS):
_process_class = SiestaBaseWorkChain
_params_lookup = (...define the parameter parsing)
_expose_inputs_kwargs = {'exclude': ('metadata',)}
and
class GenericEOS(BaseIterator?):
...
class SiestaEOS(SiestaIterator):
pass
And you can reuse the EOS workchain in both cases. What's the main difference that you see? The advantage that I saw in my case was that I didn't have to define how to parse structure_scales
twice (in SiestaIterator
and the EOS chain) and also I didn't have to specify _expose_inputs_kwargs = {'exclude': ('metadata',)}
again.
I also understand the need to make a collection of _parsing_func that can be used across different iterators, but the way you implemented seems to me unnecessary complicated.
That's why I asked for ideas to make it simpler :)
Honestly I would revert to the previous commit and ship the functions that modify the structure with the eos, exactly in the same way we ship the _parsing_func of SiestaIterator in the file containing the class. Then you can create a new PR where you put the iterate_params file and where we can discuss a way to make a collection of _parsing_func, divide them between siesta-specific and more general and so on. Ok?
Even if I make the EOS
not inherit from SiestaIterator
, I think that it is good that someone using the SiestaIterator
can also use them (I mean, if we have already written the code what's the harm?), so why can't we still keep it in some external file?
The main goal of this PR for me was not to update the EOS workchain, but while doing so identify weak points of the functionality of BaseIterator
and the general infrastructure for iterators and make it more robust. So in my opinion the discussion fits in this PR, otherwise we will close it without having fully acheived the changes required. But if you insist, I will open a new one.
Thanks for the feedback!
I understand this. I was too optimistic in saying that to change _process_class
was enough, but the list of changes to make is limited compared to the need of defining <YourPlugin>Iterator
, the _lookup_params
and so on.
In my opinion, in your approach, you load a lot of functions that you do not use. My idea is to make a collection of parsing_func
but not to have a _params_lookup
that contains all of them. The SiestaIterator
can not be a universal iterator that can iterate over any possible parameter in this universe. We have to limit what SiestaIterator
can do. Then, case by case, the user can create his own iterator and put in _params_lookup
what he is interested in. For instance in SiestaIterator
I'm interested in iterate over the main fdf parameters, in the EosIterator over the the scales factor. If I want both at the same time, I can create a new Iterator. To keep adding functionality to SiestaIterator
adds confusion and is not sustainable in my opinion.
And you can reuse the EOS workchain in both cases. What's the main difference that you see? The advantage that I saw in my case was that I didn't have to define how to parse
structure_scales
twice (inSiestaIterator
and the EOS chain) and also I didn't have to specify_expose_inputs_kwargs = {'exclude': ('metadata',)}
again.
Maybe for my EoS I want to expose the metadata and use a namespace.
The main goal of this PR for me was not to update the EOS workchain, but while doing so identify weak points of the functionality of
BaseIterator
and the general infrastructure for iterators and make it more robust. So in my opinion the discussion fits in this PR, otherwise we will close it without having fully acheived the changes required. But if you insist, I will open a new one.
The only improvement I see is to have a place where to collect the parsing_func
but I need to think what is the best way. Again in this I think the AiiDA core developers could help.
The only improvement I see is to have a place where to collect the
parsing_func
but I need to think what is the best way. Again in this I think the AiiDA core developers could help.
Maybe a @calcfunction
factory??
In my opinion, in your approach, you load a lot of functions that you do not use.
Yes, but is that a problem? I mean it is not really loading anything. The _params_lookup
variable is in SiestaIterator
and is not copied when you inherit from it, there's always only one instance of it. So, in fact, you are using less memory than if you create a specific _params_lookup
for your new iterator.
The SiestaIterator can not be a universal iterator that can iterate over any possible parameter in this universe.
:laughing:
Why not? (this is a serious question). If something can be iterated in SIESTA, why not supporting it in the SiestaIterator
? I think that by not putting it there, you are limiting the possibilities. Someone (e.g. me) could feel annoyed by the fact that the code to iterate over two different parameters has been written but you can not use them together because they are not implemented in the same iterator. You would have to figure out how have they been implemented and create a new iterator that understands how to parse both (and do this each time this situation happens).
So in my opinion
For instance in SiestaIterator I'm interested in iterate over the main fdf parameters, in the EosIterator over the the scales factor.
we should not assume what future users would use SiestaIterator
for. For me this sentence should be: "In SiestaIterator
I'm free to iterate over whatever I want, in the EosIterator
it only makes sense to iterate over scales, so the rest of parameters are blocked."
I agree that it is not straightforward to make it maintainable, there needs to be good organization and it has to be thought ahead. For example, what I did for structure modifications: if there is an input that accepts a structure and it's called struct
, then all the parameters that are modifications of this structure will be struct_<parameter>
. I believe that this is perfectly maintainable, as the "namespace" used by these parameters is perfectly defined. Even if there are two input structures struct
and reference
, you could still iterate over modifications of both structures at the same time without ambiguity.
Maybe a @calcfunction factory??
Yes! but I think it's not enough. I think it's powerful to be able to create "parsing groups" like the "Structures modification"
. Otherwise you need to select the calcfunctions one by one. Also it is not straightforward because some general parsing is needed before using the actual @calcfunction
(see modify_structure
in the iterate_params
file).
Maybe for my EoS I want to expose the metadata and use a namespace.
Ok, in that case you redefine _exclude_params_lookup
, but you'd have to do it with your approach as well, wouldn't you? :)
I see very difficult to maintain something like that. I'm considering for instance the fact that soon we will add support for LUA files. In your idea, every possible functionality of lua should also become a parameter to iterate over. It is, in general, very risky to say that everything is supported when, potentially, there is an infinite spectrum of possibilities.
Anyway, I guess the main point here is that we do not agree. I foresee a long discussion and a even longer development if we go in your direction. For this I believe it is better to close this PR reverting to previous commit and open new one. I would like to have the EoS as an example of the use of an iterator and then rethink the rest.
I moved back the structure modifications only to the EOS workchain, although I honestly think that by doing this we are missing some potential for the SiestaIterator
:).
I also added a first draft of automatic description of the parameters in an iterator, as we discussed. You can check it out by doing:
from aiida_siesta.workflows.iterate import SiestaIterator
print(SiestaIterator.parameters_help)
Cheers!
I adapted the EOS workflow to work with
BaseIterator
.By doing this, the main improvement exposed to the users at the moment is that you can choose the batch size, but there may be others if the
BaseIterator
gains functionality.During the process, I've discovered things that might be important for the
BaseIterator
to be useful to developers for creating their own workchains. I mean, to do something more than:One big change is that you can create iteration inputs using
cls.iteration_input()
. If I create an iteration input called "scales", for example, I allow users to do:instead of
as they are exactly equivalent. This can help prevent very large nesting when you start to nest iterators, but more importantly, since an iteration input is a regular aiida input, it can be made
required
(as we need for the EOS chain, for example).There's an extensive explanation in the docstring of
cls.iteration_input
.Let me know what do you think!
NOTE: I commented out the part where
scale_to_vol
is executed because it is not working (even in the old version). It raises an error likeMolecule
has no attributescale_lattice
. Maybe there has been some change in pymatgen. Could you look into it? Thanks!