sphuber / aiida-shell

AiiDA plugin that makes running shell commands easy.
MIT License
14 stars 7 forks source link

Make the `self` argument in the custom parser optional #96

Closed GeigerJ2 closed 3 months ago

GeigerJ2 commented 3 months ago

In the RTD How-To guides, it's mentioned that a custom parser needs to have the specific function signature:

    def parser(self, dirpath: pathlib.Path) -> dict[str, Data]:
        """Parse any output file generated by the shell command and return it as any ``Data`` node."""

I always find the unused self argument of a function outside of a class a bit weird. At least in the RTD examples, it's not being used inside the parsing function? Is it possible to make that optional? With @superstar54 we had a quick look at the implementation already, and it seems that should be possible, but maybe there's a good reason that prevents it.

sphuber commented 3 months ago

At least in the RTD examples, it's not being used inside the parsing function?

You are right, it is not, but it could. I have added it because I imagined that for certain custom parsers one might need access to methods of the parser instance, such as the logger, or the CalcJobNode to get information of the job's inputs. It is true that technically we could make it optional. In the ShellParser I could check the signature of the function and if it defines two arguments, then pass both the parser instance and the dirpath. But this would lead to there being two ways of doing things, making things more complex.

One option would be to improve the documentation explaining what self actually is (just noticed it is not even mentioned that it is the Parser instance) and add an actual example of how it can be used.

I am trying to remember why I named it self. Maybe because in an earlier implementation it was actually attached as a member method of the Parser instance and so it was required. But in the current implementation it is simply called as a stand-alone function so it is not required.

Alternatively, I could change the order of the arguments, and make the Parser instance an optional argument. This would allow the following signatures:

def parser(dirpath: pathlib.Path) -> dict[str, Data]

or

def parser(dirpath: pathlib.Path, parser: Parser) -> dict[str, Data]

This would be backwards incompatible, however, but I could release it with v0.8.0.

Thoughts?

GeigerJ2 commented 3 months ago

Yeah, I thought there might be other use cases where it's required, but the RTD example is kept simple. I agree that providing two ways of doing things makes it more complicated, and we should avoid that. That's good to know, I was also wondering what self would actually refer to when it is being used. Though, I still don't fully understand how the Parser would enter if one provides just a custom standalone parsing function, rather than a custom Parser class? If one writes a custom Parser class, that should implement a parse method instead, no? At least for AiiDA plugins, but I'm probably mixing two different things here...

In any case, I think your proposed solution works nicely, as it really makes it clear what is being passed. [^1] If this backwards incompatible change can still be included in the minor release, then that's fine for me (it's pre v1.0 anyway, so that's fine?).

[^1]: Maybe in the example code snippet, we could name the function my_parser or something like that, to avoid confusion from the function name being the same as its argument (even though it would work, ofc).

sphuber commented 3 months ago

Though, I still don't fully understand how the Parser would enter if one provides just a custom standalone parsing function, rather than a custom Parser class? If one writes a custom Parser class, that should implement a parse method instead, no? At least for AiiDA plugins, but I'm probably mixing two different things here...

Exactly, for each shell job the ShellParser is called by default. And in its parse method, it checks whether a parser input was defined:

https://github.com/sphuber/aiida-shell/blob/master/src/aiida_shell/parsers/shell.py#L37

in which case it unpickles that function and calls, passing itself and the path of the retrieved files to it: https://github.com/sphuber/aiida-shell/blob/ffa7c6549e5dca6db5cab3b114c30e3365f289cb/src/aiida_shell/parsers/shell.py#L139

I think in an older implementation I may have actually added the custom parser function as a member method of the ShellParser instance before calling it and so it would have received the self argument automatically, making it required to be specified in the parser functions signature.

If this backwards incompatible change can still be included in the minor release, then that's fine for me (it's pre v1.0 anyway, so that's fine?).

That was my reasoning for saying that it could go into the next minor release, since that is more or less a rule of thumb in semantic versioning. Still, I want to make sure we really want this because I am not sure that everyone that is using aiida-shell is aware of this practice.

GeigerJ2 commented 3 months ago

Thanks for the explanation, @sphuber, makes sense!

That was my reasoning for saying that it could go into the next minor release, since that is more or less a rule of thumb in semantic versioning. Still, I want to make sure we really want this because I am not sure that everyone that is using aiida-shell is aware of this practice.

Yeah, I wasn't really aware of that myself, but just assumed it's like that, as you proposed to include it in the minor release. Overall, I don't think it's a big issue when one is aware of the fact that self needs to be included as an argument, and why that is. Extending the documentation section on adding a parser should be enough for that. I'll open a PR.

Apart from that, we could also target a first major release, no? API seems to be stable enough. As we discussed today, we could also organize 1 or 2 days during which everybody from the team actually uses the tool, and do the major release after that (possibly also under aiidateam)?

sphuber commented 3 months ago

Extending the documentation section on adding a parser should be enough for that. I'll open a PR.

Before you do. I was actually thinking of going ahead and improve the interface as we discussed. I was also considering another change that was breaking and so that I was pushing off. But now I think it is time to bite the bullet and do these final changes and then make a v1.0.0 release with stable.

GeigerJ2 commented 3 months ago

Caught me just in time, I just started looking into it 😅

But in the current implementation it is simply called as a stand-alone function so it is not required.

Is this true, though? Running the code without the self argument in the parser function excepts.

Another thing I'm thinking now, as I was accessing methods/objects of self in the parser: It's actually nice that it has the relevant CalcJobNode, Logger(Adapter), etc., attached to it. This wouldn't be possible in the same way if one passes a Parser instance via the parser argument?

sphuber commented 3 months ago

Is this true, though? Running the code without the self argument in the parser function excepts.

That is only because the Parser is actually calling that function passing it two arguments. If your function does not accept two arguments, it excepts.

Another thing I'm thinking now, as I was accessing methods/objects of self in the parser: It's actually nice that it has the relevant CalcJobNode, Logger(Adapter), etc., attached to it. This wouldn't be possible in the same way if one passes a Parser instance via the parser argument?

This I will keep, and it is possible to access everything from the Parser instance as if it was inside a member method. There is no difference. The self is not some magical thing, it is literally just the Parser instance.

GeigerJ2 commented 3 months ago

That is only because the Parser is actually calling that function passing it two arguments. If your function does not accept two arguments, it excepts.

What I meant is that providing two arguments is not enough, the first one has to be self: https://github.com/sphuber/aiida-shell/blob/ffa7c6549e5dca6db5cab3b114c30e3365f289cb/src/aiida_shell/calculations/shell.py#L194-L198

This I will keep, and it is possible to access everything from the Parser instance as if it was inside a member method. There is no difference. The self is not some magical thing, it is literally just the Parser instance.

OK, then I'm happy. Will review your PR #97 now!