perlancar / perl-Getopt-Long-More

1 stars 1 forks source link

Incompatibilty with "hash storage" mode of <Getopt::Long> #1

Closed tabulon closed 5 years ago

tabulon commented 5 years ago

Hello Perlancar,

Thank you very much for Getopt::Long::More, and the many other contributions to the Perl community (to the point that metacpan has trouble rendering the list of your modules 👍

I hesitated between flagging this as a "bug" or a "feature request".

Normally, it could be considered either way, but since backward compatibility with Getopt::Long is an advertised feature of Getopt::Long::More, I figured it should be considered a bug.

SUMMARY & MOTIVATION

As explained below, Getopt::Long::More does not seem to be compatible with the "hash storage" mode of Getopt::Long.

This may seem like a minor point at first, but it fundamentally breaks the Getopt::Long compatibility of Getopt::Long::More, which means it is not really a drop-in replacement, which in turn means:

So, basically, you have to choose between a set of DRY goodness brought by hash-storage and another set of DRY goodness brought by Getopt::Long::More, which is really a shame... 👎

The issue is described below, and the remedy should be quite simple. If requested, I can submit a patch.

In fact, once this issue is resolved, it could also help eliminate the obligation to pass a handler to optspec(), but that's just a nice to have.

DESCRIPTION

THIS WORKS (obviously)

Here's a shortened snippet from the module's SYNOPSIS (which obviously works) 👍

 use Getopt::Long::More;  

 my %opts;
 GetOptions(
     'foo=s' => \$opts{foo},
     'bar'   => sub { ... },
     'baz'   => optspec( 
         handler => \$opts{baz},
         default => 10,
         summary => 'Blah blah blah',
      ),
);

ALSO EXPECTED TO WORK ( but doesn't )

One would normally expect the following to be accepted (but it doesn't work ) 👎

 use Getopt::Long::More;  

 my %opts;
 GetOptions(
    \%opts,
     'foo=s' => \$opts{foo},
     'bar'   => sub { ... },
     'baz'   => optspec( 
         handler => \$opts{baz},
         default => 10,
         summary => 'Blah blah blah',
      ),
);

The above results in the following error message (ultimately thrown by Getopt::Long ):

 Invalid option linkage for "baz"

, on its own, behaves as expected

Normally, the same style (without the optspec(), of course) is happily accepted by Getopt::Long. It is in fact a documented documented variation 👍

Any mixture is possible. For example, the most frequently used options could be stored in variables while all other options get stored in the hash...


The CULPRIT :-)

The issue seems to stem from the following statement in Getopt::Long::More at the very beginning of the GetOptionsFromArray subroutine (which is the workhorse, so it effects all other variations of GetOptions* routines) :

    if (ref($_[0]) eq 'HASH') {
        # we bail out, user only specifies a list of option specs, e.g. (\%h,
        # 'foo=s', 'bar!')
        return Getopt::Long::GetOptionsFromArray($ary, @_);
    }

The problem here is the naive assumption that the caller can't (or won't) mix and match the "hash storage" mode with that of the classical invocation mode (with the possibility to specify a corresponding "handler" following each spec).

perlancar commented 5 years ago

Hi Tabulo,

Thanks for your detailed comment. I admit I very seldomly use the "hash storage" mechanism of Getopt::Long. The code that you mentioned, something like:

GetOptions(
    \%opts,
     'foo=s' => \$opts{foo},
);

is not exactly described in the Getopt::Long's documentation. But it indeed works, and the "Any mixture is possible ..." sentence does imply that the code should work.

I guess the same behavior could be applied to Getopt::Long::More. A PR is welcome :-)

tabulon commented 5 years ago

Hi Perlancar,

Thank your for your prompt reply.

Yes, had figured you weren't probably using that mechanism ... :-)

Where it really shines, of course, is when you need that seldom subroutine handler in an otherwise boring set of options that just get stored in the hash, like below:

GetOptions(
   \%opts,
   'foo=s',
   'bar=s' => sub { ... },
   'baz=s',
);

Anyhow, thanks a lot again. I will try to put in a PR relatively soon -- while still kinda fresh in my mind, or otherwise it might be a long time before I get back to it.

Cheers!

On 18 Jan 2019, at 8:38, perlancar wrote:

Hi Tabulo,

Thanks for your detailed comment. I admit I very seldomly use the "hash storage" mechanism of Getopt::Long. The code that you mentioned, something like:

GetOptions(
    \%opts,
     'foo=s' => \$opts{foo},
);

is not exactly described in the Getopt::Long's documentation. But it indeed works, and the "Any mixture is possible ..." sentence does imply that the code should work.

I guess the same behavior could be applied to Getopt::Long::More. A PR is welcome :-)

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/perlancar/perl-Getopt-Long-More/issues/1#issuecomment-455454384

perlancar commented 5 years ago

Yes, had figured you weren't probably using that mechanism ... :-)

Yeah, I guess I had always been using this style:

GetOptions("foo=s" => \$foo, "bar=s" => \$bar, ...);

even though in practice I usually use Getopt::Long like this:

our %Opts = (...);
GetOptions("foo=s" => \$Opts{foo}, "bar=s" => \$Opts{bar}, ...);

and by the time I knew about hash storage, I pretty much had moved on from Getopt::Long.

tabulon commented 5 years ago

Yupp, yupp, figured :-)

BTW, I had read thru your "Getopt" blog series a while back... Quite well written, I thought; and have had similar concerns and observations.

In a nutshell, the fact that that blog series had to be done in the first place tells a lot about the state of affairs...

On 19 Jan 2019, at 1:43, perlancar wrote:

Yes, had figured you weren't probably using that mechanism ... :-)

Yeah, I guess I had always been using this style:

GetOptions("foo=s" => \$foo, "bar=s" => \$bar, ...);

even though in practice I usually use Getopt::Long like this:

our %Opts = (...);
GetOptions("foo=s" => \$Opts{foo}, "bar=s" => \$Opts{bar}, ...);

and by the time I knew about hash storage, I pretty much had moved on from Getopt::Long.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/perlancar/perl-Getopt-Long-More/issues/1#issuecomment-455731494

tabulon commented 5 years ago

Meanwhile, I have progressed quite a bit on the PR.

As somewhat suspected, regardless of the hash-storage issue (technically), GLM had already a bug for the case of mixing implicit/explicit handlers ("linkages" as GL calls them).

As an example, the following code would NOT work on current MASTER :

my $res = GetOptions(
   'flag1|1',
   'flag2|f',
   'bool|b!'        => \$opts{bool},
   'int=i'          => optspec( handler => \$opts{int}   ),
   'flag3|k',
   'module|M=s@'    => optspec( handler => $opts{module} ),
);

I have already produced a fix, which I will submit shortly. Once that was resolved, allowing "hash-storage" mode was quite easy (only a couple lines).

So we have got 3 distinct changes :

( I suppose there would need to be some documentation changes to accompany #C3 above, and possibly even #C2. I will try to jot something down.)

BTW, how would you prefer us to proceed?


On 19 Jan 2019, at 2:27, Ayhan Ulusoy wrote:

Yupp, yupp, figured :-)

BTW, I had read thru your "Getopt" blog series a while back... Quite well written, I thought; and have had similar concerns and observations.

In a nutshell, the fact that that blog series had to be done in the first place tells a lot about the state of affairs...

On 19 Jan 2019, at 1:43, perlancar wrote:

Yes, had figured you weren't probably using that mechanism ... :-)

Yeah, I guess I had always been using this style:

GetOptions("foo=s" => \$foo, "bar=s" => \$bar, ...);

even though in practice I usually use Getopt::Long like this:

our %Opts = (...);
GetOptions("foo=s" => \$Opts{foo}, "bar=s" => \$Opts{bar}, ...);

and by the time I knew about hash storage, I pretty much had moved on from Getopt::Long.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/perlancar/perl-Getopt-Long-More/issues/1#issuecomment-455731494

perlancar commented 5 years ago

BTW, how would you prefer us to proceed?

  • a) Single PR with multiple commits (to eventually cherry pick from)?
  • b) Multiple PRs ?

I'm fine with either.

tabulon commented 5 years ago

Hi Perlancar,

Attached is the proposed change based on PR #4.

As mentioned earlier, once #3 was fixed by #4 , this became a piece of cake, effecting only a coupe lines of code.

One question mark, though:

I slightlly modified the logic for hash-storage mode detection, so as to somewhat mimick Getopt::Long itself, which does allow objects in there (as long as they are HASH based).

For this, I first reached out for the good old regex recipe ( ref ( $[0] ) =~ /HASH/ ), like in one of the two patches attached (which became alt-B), with the well known pitfalls i.e.:

By itself, the new regex check still leaves out anything that stringifies.. (OK, could be a long shot edge case, but who knows :-).

For something like this, I would normally reach out for Scalar::Util::reftype(), but I am not sure how it would impact the load performance, so I went for the "get want you want and pay for it" approach below, which is now my preferred patch (alt-A) :

    if ( ref($_[0]) ) {
      require Scalar::Util;
      if ( Scalar::Util::reftype ($_[0]) eq 'HASH') {
        push @go_opts_spec, shift;  # hash-storage' mode is now directly supported
      }
    }

Obvously, this one adds a dependancy, on an ubiquitious core module, yes, but this may still be undesirable for a module like Getopt*

Naturally, it's your call. You would have a much better take on this one, anyway.

In any case, let me know what you decide so that we can prepare a proper PR for this instead of a patch, if you wish.

tabulon commented 5 years ago

Oops, it's probably better with the two alternative patches mentioned just above... Sorry!

proposed.fix-issue-1.based-on-pr4.alt-A.with-Scalar-Util.diff.txt

proposed.fix-issue-1.based-on-pr4.alt-B.with-Regex.diff.txt

perlancar commented 5 years ago

Tabulo,

Thanks a lot for the PRs. They are simply amaziing with regard to documentation. Because of you I become aware of the hash storage and implicit linkage issue, which I'll have to handle at least in Complete::Getopt::Long and Getopt::Long::Complete. I've added that to my personal todo list.

As for the use of reftype and Scalar::Util: I usually don't care about objects that stringify and just use ref(). But loading Scalar::Util only adds about 1ms so I guess that's okay.

I'll be releasing a new version of GLM shortly.

tabulon commented 5 years ago

Hi Perlancar,

Thank you, not only for your reactivity on the PRs; but also for having taken up the "Getopts*" challenge for the community, and producing a few excellent modules on the subject, GLM is my favorite, as well as the the blog series write-up.

Could you please hold off a couple days before the release?

I have a few nit-picks on my own code:

Thank you again, & Chers!

On 20 Jan 2019, at 13:37, perlancar wrote:

Tabulo,

Thanks a lot for the PRs. They are simply amaziing with regard to documentation. Because of you I become aware of the hash storage and implicit linkage issue, which I'll have to handle at least in Complete::Getopt::Long and Getopt::Long::Complete. I've added that to my personal todo list.

As for the use of reftype and Scalar::Util: I usually don't care about objects that stringify and just use ref(). But loading Scalar::Util only adds about 1ms so I guess that's okay.

I'll be releasing a new version of GLM shortly.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/perlancar/perl-Getopt-Long-More/issues/1#issuecomment-455863035

perlancar commented 5 years ago

I did a release before you sent the last reply, but can of course shoot for another release. So send the commits anyway.

BTW what do you think about and empty optspec() with hash storage mode? Currently this combination dies and for simplicity I'd let it be that way.

On Sun, Jan 20, 2019, 8:00 PM Tabulo <notifications@github.com wrote:

Hi Perlancar,

Thank you, not only for your reactivity on the PRs; but also for having taken up the "Getopts*" challenge for the community, and producing a few excellent modules on the subject, GLM is my favorite, as well as the the blog series write-up.

Could you please hold off a couple days before the release?

I have a few nit-picks on my own code:

  • that drainage loop hasdbecome less legible. (I now have a better rewrite available)
    • also figured perhaps we could still "die" in #5 under certain circumstances :

die ... if $obj->{required} || $obj->exists{default}

Thank you again, & Chers!

On 20 Jan 2019, at 13:37, perlancar wrote:

Tabulo,

Thanks a lot for the PRs. They are simply amaziing with regard to documentation. Because of you I become aware of the hash storage and implicit linkage issue, which I'll have to handle at least in Complete::Getopt::Long and Getopt::Long::Complete. I've added that to my personal todo list.

As for the use of reftype and Scalar::Util: I usually don't care about objects that stringify and just use ref(). But loading Scalar::Util only adds about 1ms so I guess that's okay.

I'll be releasing a new version of GLM shortly.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub:

https://github.com/perlancar/perl-Getopt-Long-More/issues/1#issuecomment-455863035

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/perlancar/perl-Getopt-Long-More/issues/1#issuecomment-455864666, or mute the thread https://github.com/notifications/unsubscribe-auth/AAM4jF9w17dWuLx7B8wANX4jke9OnyRwks5vFGhogaJpZM4aGPcW .

tabulon commented 5 years ago

No problem. It's working code, after all...

But now that the released documentation already reads "All properties are optional", it's probably too late to re-introduce "death" at object creation even for the case mentioned in my previous mail ( if $obj->{required} || $obj->exists{default} ).

That's not the end of the world though. It's probably even better:

After all, in this case, if we are going to die or warn, a better time for it might be when we actually face the wall (i.e. when the "handler" is actually needed in the "validation" loop, right after the "inner" call).

What do you think?

BTW, the patch I was preparing already had that as an alternative implementation; I will try to submit that soon for your review.

More importantly, that patch also contained another little "trick" (which directly relates to your question about empty optsec() dying), namely in the form of a single check for exists $e>{handler} before doing anything serious with it, i.e. before attempting a corresponding push on @go_optspec.

afaics, that resolves your question altogether, in quite an elegant way and without adding any real complexity.

Regardless of how it gets implemented, my humble opinion would be in favor of avoiding an unneccesary death in that case. And letting "OptSpec" just evaporate completely out of the way appears to be the way to go.

This allows GoL to remain pretty much the sole decisision-maker on what constitutes a valid list of arguments to GetOpts*(), allowing GLM to remain compatible with GoL even if the latter evolves later on; and all that in a manner that is practically "future-proof" (if there is ever such a thing).


In any case, the said trick also allows to get at the real tangible benefit of omitting the handler, i.e. enabling GoL to do its own DWIM (regardless of "hash-storage" mode).

GoL will attempt its DWIM trickery when it sees that no "linkage" is given (i.e. truly absent) for a given option on its argument list.

BTW, GoL won't necessarily do that when it encounters an undef linkage. Afaik, that will just confuse GoL. Alas, the now released version of GLM does just that, simply resulting in an undef to be passed on to GoL as "linkage" in the case of a missing OptSpec "handler".

With the proposed patch, though, OptSpec objects will simply and completely evaporate out of the way when handler is missing, matching exactly the way GoL expects things... :-)

As a side note, that's why I had revisited that "drainage/evaporation loop" to start with (first for #3 and then again for #5).

BTW, the upcoming patch also results in the same exact behaviour even for '<>':

In that particular case, GoL will not appreciate a missing "linkage", and will complain accordingly; but that that won't be any different than what happens today when you pass '<>' to GoL without a corresponding CODE-ref (even in the absence of GLM).

What do you think?

tabulon commented 5 years ago

BTW, I do realize this whole business of doing away with the reliance on and systematic (re)production of (optspec => linkage) pairs might be quite discomforting ...

However, as obscure as it may appear, that's the way GoL (Getopt::Long) works -- with or without hash-storage. And it is actually documented as such, although you have to really look hard to dig that fact/intention in the GoL docs.

If in doubt, the GoL maintainer can be reached out.

Meanwhile, the quotes given below appear to be the only relevant doc snippets I could find that touch upon the intentions for the actual code behaviour for the question at hand.


So, for reference, below are the two very brief passages (Q1 & Q2) where the subject is touched upon in GoL docs, along with a example code (Q3).

What happens in the case of a "missing linkage" in hash-storage mode has an associated code example (Q3), as well as a descriptive mention (Q2; that was already partially mentioned earlier).

The case of a "missing linkage" in the "normal" mode (without hash-storage), however, only carries a succint description elsewhere in another section (quoted as Q1 below), and without any code example at sight.

Q1)

Default destinations

When no destination is specified for an option, GetOptions will store the resultant value in a global variable named opt_XXX, where XXX is the primary name of this option. When a program executes under use strict (recommended), these variables must be pre-declared with our() or use vars. ...

Q2)

Storing options values in a hash

Sometimes, for example when there are a lot of options, having a separate variable for each of them can be cumbersome. GetOptions() supports, as an alternative mechanism, storing options values in a hash.

To obtain this, a reference to a hash must be passed as the first argument to GetOptions(). For each option that is specified on the command line, the option value will be stored in the hash with the option name as key. Options that are not actually used on the command line will not be put in the hash, on other words, exists($h{option}) (or defined()) can be used to test if an option was used. The drawback is that warnings will be issued if the program runs under use strict and uses $h{option} without testing with exists() or defined() first. ... Any mixture is possible. For example, the most frequently used options could be stored in variables while all other options get stored in the hash:

Q3.code)

my $verbose = 0;                    # frequently referred
my $debug = 0;                      # frequently referred
my %h = ('verbose' => \$verbose, 'debug' => \$debug);
GetOptions (\%h, 'verbose', 'debug', 'filter', 'size=i');
if ( $verbose ) { ... }
if ( exists $h{filter} ) { ... option 'filter' was specified ... }
perlancar commented 5 years ago

But now that the released documentation already reads "All properties are optional", it's probably too late to re-introduce "death" at object creation even for the case mentioned in my previous mail ( if $obj->{required} || $obj->exists{default} ).

I think saying that "all properties are optional" does not preclude that some combination of properties are invalid. Some properties might need to be defined together, while some conflict with one another.

Also, I think in general dying early (when the code needs to) is better. (Though in the case of a command-line options parser, everything we do is usually in "early" stage of a running application anyway.)

To clarify, the patch that you're preparing will cause GLM's:

GetOptions(foo => optspec(), 'bar');
GetOptions(foo => optspec(summary=>"blah"), 'bar');

GetOptions(\%opts, foo => optspec(), 'bar');
GetOptions(\%opts, foo => optspec(summary=>"blah"), 'bar');

to result in passing these to GoL?

GetOptions('foo', 'bar');

GetOptions(\%opts, 'foo', 'bar');

while these:

GetOptions(foo => optspec(handler=>undef), 'bar');

GetOptions(\%opts, foo => optspec(handler=>undef), 'bar');

will become:

GetOptions('foo' => undef, 'bar');

GetOptions(\%opts, 'foo' => undef, 'bar');
tabulon commented 5 years ago

Going backwards: Yes, that is exactly what I am preparing.

Any concerns about that ovarall behaviour?

In fact, with the patch, even this:

GetOptions('<>' => optspec(), 'bar');

will just become:

GetOptions('<>', 'bar');

in which case <Getopt::Long> should loudly complain (die) as it usually would do even on on its own:

Option spec <> requires a reference to a subroutine

That's exactly the death parlance I get from GoL (both with and without GLM) when I try it in a separate script.

But I can't seem to simulate that in a proper dies_ok() test case, .... probably coming from some ignorance on my part. Is that OK if I mark that test as TODO and then you take a stab on it on your end?

Unless if you think it's not a test case for GLM but rather one for GoL itself... If so, please just go ahead and delete it. IMHO, that approach would be completely understandable, and probably even better.

After all GLM shouldn't probably care what GoL decides to to with in case of a missing '<>' handler, just like with any other handler.


I will respond on the other topoic on a separate comment.

tabulon commented 5 years ago

I see why the "dies_ok()" test case doesn't work. It's due to the eval in test_getoptions. Switched it to a SKIP with an associated comment that recommends just deleting the test.

tabulon commented 5 years ago

Never mind never mind. Getting tired, I gues... Just saw the 'dies' parameter.

tabulon commented 5 years ago

... which I couldn't manage to leverage.

Anyhow, here's the commit that woud go into PR : https://github.com/tabulon-ext/perl-Getopt-Long-More/commit/3471aff482f89d0ba98af6bdf0a987036ef7a36c

Waiting for your feedback before I package it up into a PR (without any lengthly commit message this time, I promise :-)

tabulon commented 5 years ago

The bad news also is hidden in a comment on line 278 of "01-basic.t"; for which I will open another issue (hopefully the last one in the "GoL compatibility" series...)

Rest assured that it is not a new bug introduced by this patch (or any other recent change). It's been lurking in there probably since begining of time :-)

Also, it relates only to a really historical usage pattern from the 1990s which even GoL calls "legacy" and deprecates (while continuing to support it, though :-)

perlancar commented 5 years ago

Any concerns about that ovarall behaviour?

Nope. This should be the right thing to do, even with regard to '<>'.

tabulon commented 5 years ago

Ok, we are in complete agreement, then.

Before I package it into a PR, any feedback on the implementation that sits on the feature branch mentioned above ( https://github.com/tabulon-ext/perl-Getopt-Long-More/commit/3471aff482f89d0ba98af6bdf0a987036ef7a36c ) ? It's the last 2 commits that are concerned.

As you will see, the code change in More.pm is minimal; with most added lines going into the test suite.

Note for our future selves ( or any future stumbler )

We have gone off-topic!

Almost all of our comments appearing here (on issue #1) since the last *24 hours do in fact relate mostly to to issue [#5].