Undesired autovivification in Perls >= 5.22

barefootcoder commented 5 years ago

Since a picture is worth a thousand words:

[caemlyn:~] cat t.pl
#! /usr/bin/env perl

use Kavorka qw(fun);

fun foobar (:$error) {}

my %user = ();
print "before: @{[keys %user]}\n";

foobar(error => $user{error});
print "after: @{[keys %user]}\n";

#print Kavorka->info(\&foobar)->signature->injection, "\n";
[caemlyn:~] perlbrew exec t.pl
perl-5.24.4
==========
before: 
after: error

perl-5.22.4
==========
before: 
after: error

perl-5.20.3
==========
before: 
after: 

perl-5.18.4
==========
before: 
after: 

perl-5.16.3
==========
before: 
after: 

perl-5.14.4
==========
before: 
after:

This appears to be related to #18. It is probably not related to #23, although your quote there:

If you want to explicitly alter %_ within the body of the function — add new keys and values, or change existing values — that's fine. But for Kavorka to implicitly alter it could lead to very hard to trace errors.

does seem eerily apropos. :-D

As you can see from the commented-out line in the test script, I was playing around with trying to figure out where the problem is. I think (though I'm still not 100% sure) that is has to do with the difference between this (pre-5.22) injection code:

Data::Alias::alias(%_ = ($#_==0 && ref($_[0]) eq q(HASH)) ? %{$_[0]} : @_[ 0 .. $#_ ])

which seems to be making %_ an alias to @_, vs this (post-5.22) injection code:

my $key = $_[$i];\$_{$key} = \$_[$i+1];

which seems to be making individual elements of %_ aliases to individual elements of @_. Sort of the difference between an alias of a hash vs a hash of aliases, as one of my coworkers summed it up.

It doesn't matter whether you use method or another keyword instead of fun, it doesn't matter whether the named parameter is required or not, and it doesn't matter whether you pass a hash or a hashref. Nothing other than the version of Perl seems to impact it (well, and it must be a named parameter, obviously).

I could probably come up with a failing unit test if you like. Also, if you have any thoughts on a temporary workaround, that might be helpful. Thx.

tobyink commented 5 years ago

Yeah, I agree with your assessment about the cause of it. Not sure what the solution would be.

I can think of a workaround, but it's a bit of a pain. You'd add a trait Kavorka::Sub which overrides the signature_class attribute to change the class from the default of Kavorka::Signature to a custom subclass. The custom subclass would override _injection_hash_underscore to return an empty string.

barefootcoder commented 5 years ago

Yeah, I agree with your assessment about the cause of it. Not sure what the solution would be.

Well, my first instinct was to ask, can't refaliasing alias %_ to @_ the same way that Data::Alias could, but after pondering it a bit I think I should back up a step further: in the absence of an is alias trait, why make %_ any sort of alias at all?

I can think of a workaround, but it's a bit of a pain. You'd add a trait Kavorka::Sub which overrides the signature_class attribute to change the class from the default of Kavorka::Signature to a custom subclass. The custom subclass would override _injection_hash_underscore to return an empty string.

I have a very vague idea of what you're talking about here, but I need to dig into the code a bit more before I can say for sure that I see what you mean. :-)

tobyink commented 5 years ago

The elements of %_ are aliases to the parameters passed into the function for consistency with @_.

barefootcoder commented 5 years ago

The elements of % are aliases to the parameters passed into the function for consistency with @.

Fair enough. I then thought I'd fall to back to my first instinct and ask if the refaliasing method could do it the same way that Data::Alias did, but after some digging, I agree it doesn't seem possible. I'm not even sure how Data::Alias was doing it—I couldn't really follow all the XS code—but I suspect it's something sneaky to do with pointers pointing at odd places. Then I thought I'd try to formulate an argument as to why refaliasing damned well should be able to replace a module it was out-and-out breaking, but I soon realized that was going to be a pretty feeble argument if Data::Alias is cheating, as I suspect it is. I wonder if perl5porters would consider an argument that taking a reference to a hash value shouldn't autovivify it ... :thinking:

So I guess I'm going diving into the workaround. I'm guessing from my limited understanding of the Kavorka guts that your suggestion will turn off the ability to use %_ altogether. Assuming I can get that working, would you consider enshrining that into core Kavorka as an option? Or maybe an option to keep %_ but not have it be an alias would be easier/better?

I'm planning to go through our codebase and repoint everything at a policy module that wraps Kavorka thinly, so that way I can easily consolidate any options, change them in one place, and have them available everywhere. That will enable some much needed consistency in our use of signatures too. :-) So, long story short, an option to turn off ... something ... would be helpful for us. I'm not sure how crucial it is for everyone else ... obviously it's a lot of conditions that all have to line up just so before this manifests as an actual bug.

tobyink commented 5 years ago

I think an option to disable %_ is probably more useful than an option to turn off the aliasing for it because it would not just solve this issue, but also would be an optimization. (Building %_ doesn't use a lot of CPU, but skipping building it and skipping localizing it should save a little bit of time.)

barefootcoder commented 5 years ago

I think an option to disable %_ is probably more useful than an option to turn off the aliasing for it because ...

Yep, that makes sense. I'm digging into it now.

barefootcoder commented 5 years ago

Okay, I'm deep in it now. :-) Maybe tell me what you think of a couple of different options.

First, the general interface. I sort of liked the idea of being able to do it at each different level:

method foo (:$bar!) but nohashunderscore { ... }    # just this one method

use Kavorka method => { nohashunderscore => 1 };    # all methods

use Kavorka { nohashunderscore => 1 };              # all subs, regardless of type

Doesn't have to specifically be nohashunderscore ... feel free to suggest a better name.

Implementation-wise, it seems clear that this then needs to be a trait (e.g. Kavorka::TraitFor::Sub::nohashunderscore). When it comes to what the trait does, though, there's a couple of different options:

Do exactly as you suggest above: override signature_class to return a different class,
- which could literally be a subclass of Kavorka::Signature, or
- maybe an anonymous class of Kavorka::Signature composed with a role (which could theoretically be extended to other features as well, if that's a consideration).
Since this implementation is happening inside Kavorka as opposed to outside it, we could just build a boolean flag inside Kavorka::Signature which defaults to off, the trait uses an after parse_signature => sub to flip it on, and _injection_hash_underscore just bails right at the top: return if $self->nohashunderscore or what-have-you. Simpler and leaner, but perhaps less extensible.

You have any preferences among any of these choices? I'm going to keep working on the unit test side, so I can pivot on implementation whenever you have time to weigh in.

barefootcoder commented 5 years ago

Update on my progress:

I've hit a snag, where it looks like bailing out of _injection_hash_underscore isn't sufficient if the named parameter is required. Required named parameters are checked by verifying that they exist in %_, looks like, over in Kavorka::Parameter::_injection_extract_value. I'm exploring ways around it now, but if you have any thoughts as to the best approach to work around that I'd love to hear them. :-)

barefootcoder commented 5 years ago

Okay, the more I dive into this, the more parallels I see with this issue and evalEmpire/method-signatures#71. The only difference is:

We were using %args where you're using %_.
We only alias if one or more of the named parameters have is alias.

Now, it occurs to me that, while not aliasing might have better performance due to not having to create an alias unnecessarily, it also might have worse performance due to the fact that you're double-copying all the named parameters. Unless we can think of a way to not build an intermediate hash at all, but, the more I think about it, I don't think there's any sort of practical way at all: since named parameters can be in any order, at the very least you'd have to have an intermediate hash for mapping parameter names to their corresponding positions in @_, and that just seems silly. So I'm not sure what the answer is as to which flavor of inefficiency is the most inefficientest (almost certainly "it depends"), but let's call it a wash and assume there's just no real performance enhancements to be gained whatsoever.

So it comes down to whether you're triggering a bug or not, I think. Prior to 5.22, you can alias or not and it's no big deal either way. There's no bug, unless you're running into upstream bugs (see e.g. RT/83961 or evalEmpire/method-signatures#77). So, fulfilling a potential expectation that %_ contains aliases just like @_ does seems fine.

OTOH, >=5.22 it becomes more problematic. I would argue that the user is going to hit a bug if we use the aliasing when it's not requested, although it's certainly possible that the bug is so inoffensive that no one will ever notice it. So it becomes much harder to defend %_ being composed of aliases. I'm not entirely sure how often a client expectation would actually be confounded if we don't alias, but the fact that it creates a bug would seem to trump that anyway. Am I missing something, or misstating the situation?

I'm just exploring this because, as it becomes harder and harder to make this conditional, I'm starting to wonder where the value is for preserving a bug—a really hard to track-down bug at that—vs just doing it the way we ended up with in Method::Signatures. But, at the end of the day, you're in charge, so you need to make the final call. :-)

tobyink commented 5 years ago

Sorry, been pretty busy lately. I will get back to you with my thoughts when I've had some time to think them!

barefootcoder commented 5 years ago

Sorry, been pretty busy lately. I will get back to you with my thoughts when I've had some time to think them!

Fair enough.

One more thing I considered was making it always alias on <5.22 and only alias when an alias is requested on >=5.22. But I didn't like the inconsistency; if you glance through enough of those old discussions on the topic of aliasing and \@foo and what does String @foo mean, you'll prolly see that one point I say a good rule of thumb is this: how hard is it going to be to explain how it works in the POD? So I didn't relish trying write POD to explain that whether it's an alias or not not only depends on whether you asked for an alias, but also on the version of Perl. :-/

barefootcoder commented 5 years ago

Any word? I finished up the thing I was doing to distract me from this, so now I've got to make some progress on it again. :-)

tobyink commented 5 years ago

Yeah, version dependent behaviour is what we currently have, and are trying to avoid!

barefootcoder commented 5 years ago

I got redistracted for a bit there, but someone just asked me today what the status was on this (apparently it's one of the last blockers for us upgrading to Xenial, where the default Perl is 5.22). So I thought I'd ping you one last time before diving in to complicated workarounds. :-)

Thx for your time and attention thus far.

barefootcoder commented 5 years ago

Just a quick heads-up that I think I've finally cleared the last of my blockers, so I'm starting on this issue this week. It looks to me like I'm going to have to try to implement a workaround similar to what you suggested initially (only more complicated, now that I know how far down the turtles go :-) ), unless you have any further thoughts on how or even whether we can figure a way to do this inside Kavorka.

barefootcoder commented 5 years ago

Well, I'm pleased to report that I finally got out from under that other project (which was not done back in June as I initially thought), and I've released some code that works around this bug. It's theoretically proprietary code, but I got permission to put it into a gist in case you wanted to take a look, or in case it might be helpful for anyone else who stumbles across this issue in a Google search one day. :-)

Still willing to look at doing a direct patch to Kavorka if you're interested and have an idea about the direction to take for implementation. Feel free to close the issue if you like. Thx for the help, and thx for Kavorka!

tobyink / p5-kavorka

Undesired autovivification in Perls >= 5.22 #27