Closed GeigerJ2 closed 3 months ago
At least in the RTD examples, it's not being used inside the parsing function?
You are right, it is not, but it could. I have added it because I imagined that for certain custom parsers one might need access to methods of the parser instance, such as the logger, or the CalcJobNode
to get information of the job's inputs. It is true that technically we could make it optional. In the ShellParser
I could check the signature of the function and if it defines two arguments, then pass both the parser instance and the dirpath. But this would lead to there being two ways of doing things, making things more complex.
One option would be to improve the documentation explaining what self
actually is (just noticed it is not even mentioned that it is the Parser
instance) and add an actual example of how it can be used.
I am trying to remember why I named it self
. Maybe because in an earlier implementation it was actually attached as a member method of the Parser
instance and so it was required. But in the current implementation it is simply called as a stand-alone function so it is not required.
Alternatively, I could change the order of the arguments, and make the Parser
instance an optional argument. This would allow the following signatures:
def parser(dirpath: pathlib.Path) -> dict[str, Data]
or
def parser(dirpath: pathlib.Path, parser: Parser) -> dict[str, Data]
This would be backwards incompatible, however, but I could release it with v0.8.0.
Thoughts?
Yeah, I thought there might be other use cases where it's required, but the RTD example is kept simple. I agree that providing two ways of doing things makes it more complicated, and we should avoid that. That's good to know, I was also wondering what self
would actually refer to when it is being used. Though, I still don't fully understand how the Parser
would enter if one provides just a custom standalone parsing function, rather than a custom Parser
class? If one writes a custom Parser
class, that should implement a parse
method instead, no? At least for AiiDA plugins, but I'm probably mixing two different things here...
In any case, I think your proposed solution works nicely, as it really makes it clear what is being passed. [^1] If this backwards incompatible change can still be included in the minor release, then that's fine for me (it's pre v1.0 anyway, so that's fine?).
[^1]: Maybe in the example code snippet, we could name the function my_parser
or something like that, to avoid confusion from the function name being the same as its argument (even though it would work, ofc).
Though, I still don't fully understand how the
Parser
would enter if one provides just a custom standalone parsing function, rather than a customParser
class? If one writes a customParser
class, that should implement aparse
method instead, no? At least for AiiDA plugins, but I'm probably mixing two different things here...
Exactly, for each shell job the ShellParser
is called by default. And in its parse
method, it checks whether a parser
input was defined:
https://github.com/sphuber/aiida-shell/blob/master/src/aiida_shell/parsers/shell.py#L37
in which case it unpickles that function and calls, passing itself and the path of the retrieved files to it: https://github.com/sphuber/aiida-shell/blob/ffa7c6549e5dca6db5cab3b114c30e3365f289cb/src/aiida_shell/parsers/shell.py#L139
I think in an older implementation I may have actually added the custom parser function as a member method of the ShellParser
instance before calling it and so it would have received the self
argument automatically, making it required to be specified in the parser functions signature.
If this backwards incompatible change can still be included in the minor release, then that's fine for me (it's pre v1.0 anyway, so that's fine?).
That was my reasoning for saying that it could go into the next minor release, since that is more or less a rule of thumb in semantic versioning. Still, I want to make sure we really want this because I am not sure that everyone that is using aiida-shell
is aware of this practice.
Thanks for the explanation, @sphuber, makes sense!
That was my reasoning for saying that it could go into the next minor release, since that is more or less a rule of thumb in semantic versioning. Still, I want to make sure we really want this because I am not sure that everyone that is using aiida-shell is aware of this practice.
Yeah, I wasn't really aware of that myself, but just assumed it's like that, as you proposed to include it in the minor release. Overall, I don't think it's a big issue when one is aware of the fact that self
needs to be included as an argument, and why that is. Extending the documentation section on adding a parser should be enough for that. I'll open a PR.
Apart from that, we could also target a first major release, no? API seems to be stable enough. As we discussed today, we could also organize 1 or 2 days during which everybody from the team actually uses the tool, and do the major release after that (possibly also under aiidateam
)?
Extending the documentation section on adding a parser should be enough for that. I'll open a PR.
Before you do. I was actually thinking of going ahead and improve the interface as we discussed. I was also considering another change that was breaking and so that I was pushing off. But now I think it is time to bite the bullet and do these final changes and then make a v1.0.0
release with stable.
Caught me just in time, I just started looking into it 😅
But in the current implementation it is simply called as a stand-alone function so it is not required.
Is this true, though? Running the code without the self
argument in the parser
function excepts.
Another thing I'm thinking now, as I was accessing methods/objects of self
in the parser: It's actually nice that it has the relevant CalcJobNode
, Logger(Adapter)
, etc., attached to it. This wouldn't be possible in the same way if one passes a Parser
instance via the parser
argument?
Is this true, though? Running the code without the self argument in the parser function excepts.
That is only because the Parser
is actually calling that function passing it two arguments. If your function does not accept two arguments, it excepts.
Another thing I'm thinking now, as I was accessing methods/objects of self in the parser: It's actually nice that it has the relevant CalcJobNode, Logger(Adapter), etc., attached to it. This wouldn't be possible in the same way if one passes a Parser instance via the parser argument?
This I will keep, and it is possible to access everything from the Parser
instance as if it was inside a member method. There is no difference. The self
is not some magical thing, it is literally just the Parser
instance.
That is only because the
Parser
is actually calling that function passing it two arguments. If your function does not accept two arguments, it excepts.
What I meant is that providing two arguments is not enough, the first one has to be self
:
https://github.com/sphuber/aiida-shell/blob/ffa7c6549e5dca6db5cab3b114c30e3365f289cb/src/aiida_shell/calculations/shell.py#L194-L198
This I will keep, and it is possible to access everything from the
Parser
instance as if it was inside a member method. There is no difference. Theself
is not some magical thing, it is literally just theParser
instance.
OK, then I'm happy. Will review your PR #97 now!
In the RTD How-To guides, it's mentioned that a custom parser needs to have the specific function signature:
I always find the unused
self
argument of a function outside of a class a bit weird. At least in the RTD examples, it's not being used inside the parsing function? Is it possible to make that optional? With @superstar54 we had a quick look at the implementation already, and it seems that should be possible, but maybe there's a good reason that prevents it.