Open filipsajdak opened 1 year ago
Thanks!
One way to suppress this could be to require a mutating argument to be qualified with inout
, which I've thought of and @jbatez suggested in #198.
I see there's a related issue in #230 which might also be solved with an inout
argument (call-site) requirement...
I'll consider #230 and #231 together...
Thanks for picking this up again in #294.
After reconsidering the examples, I think the status quo is a feature, not a bug, in Cpp2. I think the combination of parameter passing + move from definite last use is (elegantly? certainly naturally) exposing real user code bugs that were silent in Cpp1. This is very pleasing.
That said, I agree that an argument qualifier is the right answer. But understanding why the status quo is actually a feature is important because it will:
[[nodiscard]]
There are two features interacting here:
(1) Declaring parameter passing intent: This states the direction of data flow (in
, inout
, etc.).
(2) Move from definite last use: When we know the variable won't be used again, of course it's safe to move from so it seems this should be automatic and default.
Both features let the programmer declare their intent in a way that helps expose program bugs. Specifically:
(1) An inout
or out
(or Cpp1 non-const &
) parameter is declaring that one of the function's outputs is via that argument, just as declaring a non-void
return type is declaring that one of the function's outputs is via the return value. Those are the outputs, and ignoring an output is usually bad (but of course not always, see bottom). Just as Cpp2 makes [[nodiscard]]
the default for return values, what you're encountering here is that it is naturally doing the same thing for inout
arguments too, treating both declared output data flows similarly.
(2) A last use argument is diagnosing that the variable will no longer be used. If the last use is to an inout
or out
parameter, then not looking at it afterward is just the same as calling a function with a non-void return and never looking at the returned value (which is diagnosed in Cpp2 because of the enforced [[nodiscard]]
).
So we are doing the user a favor by diagnosing this, just the same as if the user were ignoring a [[nodiscard]]
return value.
And that's why I think that we should consider naming the opt-out for "unused out result" and "unused return value" with the same name, if there's a good name. They are the same case. (Sure, you sometimes want an opt-out, but only in rarer cases where you're relying on other side effects being performed by the function and really don't need the value, in which case the code should say so by writing discard
or something.)
return
Let's consider the two versions of the code... First, consider the version of the code you used in #294:
f2: (inout x) -> _ = {
return x * 2;
}
main: () = {
x := 21;
std::cout << f2(x) << std::endl;
}
Compiling this with cppfront and then a Cpp1 compiler calls out f2(x)
as invalid. But why? The compilers tell us it's because x
is an rvalue, and the argument must be an lvalue. This is great, because it's true. There's something fishy.
What's fishy? It's f2
... it declares its parameter as inout
, but never writes to it. As you know, I aim eventually (not now) to emit a diagnostic for failure to have a non-const use of an inout
parameter on at least one path somewhere in the function... when I implement that, the error will be flagged even sooner within the callee. Right now, the error is being flagged at the call site, which I expect to be usually still caught early at f2
unit test time because it will be common for even f2
's initial toy test cases to do this... pass a last use, which exposes the bug in f2
.
What's the solution? In this case, f2
should change its parameter to be in
, and then everything compiles and runs.
But what if f2
actually modifies its parameter? That brings us to the other version of your code, above...
f2: (inout x) -> _ = {
x *= 2;
return x;
}
main: () = {
x := 21;
std::cout << f2(x) << std::endl;
}
Again we get the error flagged, but this time the problem is at the call site, f2
is okay.
Consider why f2
is okay: Even though f2
is a little odd for redundantly emitting the same output value in two different output return paths (the inout
argument and the return value), that's not wrong per se, and might be useful for chaining or whatever. So f2
is fine this time, in the sense that it's doing what it declared it would do... it's writing to its input argument, and it's returning a value.
But now the call site is definitely suspicious because it's making a call that is declared to modify its argument, but then never looks at the argument again. Ignoring an output is usually bad, at least by default.
So I view this as a great feature of Cpp2... by:
we naturally and automatically diagnosed failure to use an output. I like that a lot.
Furthermore, this is just like [[nodiscard]]
. In both cases, we want an opt-out. But what's the right name? Given:
inout_func: ( inout x ) = { /*...*/ }
returning_func: () -> _ = { /*...*/ }
Then consider this call site, where we want an explicit opt-out, and ideally the same word of power in both places since they're opting out of conceptually the same thing:
{
x := 42;
inout_func( SOMETHING x );
(SOMETHING returning_func());
}
I want to think about the naming some more, but as a start I'm not sure inout
works well for both:
// What if "SOMETHING" were "inout"? Doesn't feel quite right...
{
x := 42;
inout_func( inout x ); // inout works pretty well here
(inout returning_func()); // but not so well here
}
On the other hand, "discard" gives a nice first impression, and is symmetric with [[nodiscard]]
and could connote "don't do anything special with, including don't move its guts along" as well as "discard this thing's value, I'm not going to use it from here onward":
// What if "SOMETHING" were "discard"? I think I like it... "discard this value, I'm not going to use it after here"
{
x := 42;
inout_func( discard x ); // that word is a big red code review flag (good)
(discard returning_func()); // and here with a clear meaning
}
It seems right to use the same opt-out word for unused inout
/out
arguments and unused return values. Getting the name right is important, though. This is something I want to sleep on further, but there's my brain dump for today. Thanks again.
@hsutter Thank you for this summary - I think you synthesize it very well.
I agree that this is a similar thing as [[nodiscard]]
, and not using the inout
or out
argument is suspicious at the minimum.
I like the
discard returning_func();
But using it next to the function argument looks suspicious:
inout_func( discard x );
My first impression is that we want to discard the x
variable - unfortunately, it is on the call side before it gets to the function. It could be misinterpreted as something will happen to x
before a call to inout_func
... or maybe it is just me.
Maybe we can add a passing style to clarify:
inout_func( discard inout x ); // maybe `discard out x` to emphasize that we discard output of the x
Another keyword to consider is unused
:
x := 42;
inout_func( unused x );
(unused returning_func());
But still I would prefer to add a passing style:
x := 42;
inout_func( unused inout x ); // or unused out x
(unused returning_func());
Totally agree with all of this, discard does indeed feel like a good choice given the nodiscard symmetry.
I have two(three) questions,
Why not use [[discard]] instead of adding a new keyword? (I'm not against using just discard, merely curious, in fact is cpp2 avoiding the [[ xyz ]] syntax altogether?)
Would it be reasonable to decorate a parameter with multiple passing intentions, i.e. in_or_inout_func: ( in|inout x ) = { /*..
Suggesting that the parameter's side effect is not mandatory and therefore not worth warning when the user doesn't use it?
On 26 March 2023 22:06:18 Herb Sutter @.***> wrote:
Thanks for picking this up again in #294https://github.com/hsutter/cppfront/pull/294.
tldr
After reconsidering the examples, I think the status quo is a feature, not a bug, in Cpp2. I think the combination of parameter passing + move from definite last use is (elegantly? certainly naturally) exposing real user code bugs that were silent in Cpp1. This is very pleasing.
That said, I agree that an argument qualifier is the right answer. But understanding why the status quo is actually a feature is important because it will:
Why a feature: Diagnosing an unused side effect, like [[nodiscard]]
There are two features interacting here:
(1) Intentional parameter passing: This states the direction of data flow (in, inout, etc.).
(2) Definite move from last use: When we know the variable won't be used again, of course it's safe to move from so it seems this should be automatic and default.
Both features let the programmer declare their intent in a way that helps expose program bugs. Specifically:
(1) An inout (or Cpp1 non-const &) parameter is declaring that one of the function's outputs is via that argument, just as declaring a non-void return type is declaring that one of the function's outputs is via the return value. Those are the side effects, and ignoring a side effect is usually bad Just as Cpp2 makes [[nodiscard]] the default for return values, it is effectively doing the same thing for inout arguments too, treating both declared side effects similarly.
(2) A last use argument is diagnosing that the variable will no longer be used. If the last use is to an inout or out parameter, then not looking at it afterward is just the same as calling a function with a non-void return and never looking at the returned value (which is diagnosed in Cpp2 because of the enforced [[nodiscard]]).
So we are doing the user a favor by diagnosing this, just the same as if the user were ignoring a [[nodiscard]] return value.
And that's why I think that we should consider naming the opt-out for "unused out result" and "unused return value" with the same name, if there's a good name. They are the same case. (Sure, you sometimes want an opt-out, but only in rarer cases where you're relying on other side effects being performed by the function and really don't need the value, in which case the code should say so by writing discard or something.)
Example 1: Just return
Let's consider the two versions of the code... First, consider the version of the code you used in #294https://github.com/hsutter/cppfront/pull/294:
f2: (inout x) -> _ = { return x * 2; }
main: () -> int = { x := 21; std::cout << f2(x) << std::endl; }
Compiling this with cppfront and then a Cpp1 compiler calls out f2(x) as invalid. But why? The compilers tell us it's because x is an rvalue, and the argument must be an lvalue. This is great, because it's true. There's something fishy.
What's fishy? It's f2... it declares its parameter as inout, but never writes to it. As you know, I aim eventually (not now) to emit a diagnostic for failure to have a non-const use of an inout parameter on at least one path somewhere in the function... when I implement that, the error will be flagged even sooner within the callee. Right now, the error is being flagged at the call site, which I expect to be usually still caught early at f2 unit test time because it will be common for even f2's initial toy test cases to do this... pass a last use, which exposes the bug in f2.
What's the solution? In this case, f2 should change its parameter to be in.
But what if f2 actually modifies its parameter? That brings us to the other version of your code, above...
Example 2: Also modify parameter
Second, consider the variation you posted above:
f2: (inout x) -> _ = { x *= 2; return x; }
main: () -> int = { x := 21; std::cout << f2(x) << std::endl; }
Again we get the error flagged, but this time the problem is at the call site, f2 is okay.
Consider why f2 is okay: Even though f2 is a little odd for redundantly emitting the same output value in two different output return paths (the inout argument and the return value), that's not wrong per se, and might be useful for chaining or whatever. So f2 is fine this time, it's doing what it declared it would do... it's writing to its input argument, and it's returning a value.
But now the call site is definitely suspicious because it's making a call that is declared to modify its argument, but then never looks at the argument again. Ignoring a side effect is usually bad.
Naming the opt-out
So I view this as a great feature of Cpp2... by:
Furthermore, this is just like [[nodiscard]]. In both cases, we want an opt-out. But what's the right name? Consider, given:
inout_func: ( inout x ) = { /.../ } returning_func: () -> T = { /.../ }
Then we have this call site:
{ x := 42; inout_func( SOMETHING x ); SOMETHING returning_func(); }
I want to think about that some more, but I'm not sure inout works well for both:
// What if "SOMETHING" were "inout"? Doesn't feel quite right... { x := 42; inout_func( inout x ); // inout works pretty well here inout returning_func(); // but not so well here }
On the other hand, "discard" gives a nice first impression, and is symmetric with [[nodiscard]] and could connote "don't do anything special with, including don't move its guts along" as well as "discard this thing's value, I'm not going to use it from here onward":
// What if "SOMETHING" were "discard"? I think I like it... "discard this value, I'm not going to use it after here" { x := 42; inout_func( discard x ); // that word is a big red code review flag (good) discard returning_func(); // and here with a clear meaning }
It seems right to use the same opt-out word for unused inout/out arguments and unused return values. Getting the name right is important, though. This is something I want to sleep on further, but there's my brain dump for today. Thanks again.
— Reply to this email directly, view it on GitHubhttps://github.com/hsutter/cppfront/issues/231#issuecomment-1484224112, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AALUZQIDQWDARPE6W44CL23W6CVURANCNFSM6AAAAAAT5JDQ2Y. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Agreed with Herb's analysis on why this is actually great. Small things I want to point out from @SebastianTroy's comment:
Why not use [[discard]] instead of adding a new keyword?
I have the same question, an attribute seems like the right fit for this job instead of "some new keyword popping out of nowhere".
Would it be reasonable to decorate a parameter with multiple passing intentions, i.e. in_or_inout_func: ( in|inout x ) = { /*..
I have a question related to this, will we able to mark certain parameters in the function body as discard
, this could be another way of doing the same thing. So, our example would become
f2: ([[discard]] inout x) -> _ = {
x *= 2;
return x;
}
main: () = {
x := 21;
std::cout << f2(x) << std::endl;
}
This would be a way to signify that the mutations it makes to x
could be discarded, so cppfront would simply do a static cast of any rvalue before passing to such functions or not emit std::move
at all.
I like the idea but also don't think discard
makes sense as a parameter decoration, as you aren't discarding the entire thing, just the returned information, so maybe something like discard_return
or discard_result
. I think that would apply just as well to the return value of the function.
Another keyword to consider is
unused
:x := 42; inout_func( unused x ); (unused returning_func());
That's a good candidate. Thoughts:
x
later in the scope, but the annotation on this use is that that later last use explicitly being omitted... "unused" might connote that, but maybe less strongly than "discard" or "discard_result"?[[maybe_unused]]
.Why not use [[discard]] instead of adding a new keyword? (I'm not against using just discard, merely curious, in fact is cpp2 avoiding the [[ xyz ]] syntax altogether?)
Cpp2 is currently using [[
]]
only for contracts, and I might change that too. In Cpp1 we spell some things as attributes, in part for syntax compatibility constraints which don't apply in Cpp2.
Would it be reasonable to decorate a parameter with multiple passing intentions, i.e. in_or_inout_func: ( in|inout x ) = { /*.. Suggesting that the parameter's side effect is not mandatory and therefore not worth warning when the user doesn't use it?
If we want to express that an output (parameter out
data flow, or non-void return value) is discardable, we should have a consistent way to say that and again I would like to use the same word in both places.
For example:
inout_func_with_ignorable_result: ( ~SOMETHING inout x ) = { /*...*/ }
returning_func_with_ignorable_result: () -> ~SOMETHING _ = { /*...*/ }
I tag this as ~SOMETHING
because it probably wants to be the inverse of the above SOMETHING
.
Putting it together:
inout_func: ( inout x ) = { /*...*/ }
returning_func: () -> _ = { /*...*/ }
inout_func_with_ignorable_result: ( ~SOMETHING inout x ) = { /*...*/ }
returning_func_with_ignorable_result: () -> ~SOMETHING _ = { /*...*/ }
{ // call site
x := 42;
inout_func( SOMETHING x );
(SOMETHING returning_func());
}
Trying out @gregmarr's discard_result
suggestion:
inout_func: ( inout x ) = { /*...*/ }
returning_func: () -> _ = { /*...*/ }
inout_func_with_ignorable_result: ( discardable_result inout x ) = { /*...*/ }
returning_func_with_ignorable_result: () -> discardable_result _ = { /*...*/ }
{ // call site
x := 42;
inout_func( discard_result x );
(discard_result returning_func());
}
That looks fairly decent at first blush. Clear, and a little verbose which is a good thing for an explicitly lossy escape hatch that we want to stand out. (Syntax colorizer writers, feel free to make it red... :) )
Trying out @filipsajdak's unused
suggestion:
inout_func: ( inout x ) = { /*...*/ }
returning_func: () -> _ = { /*...*/ }
inout_func_with_ignorable_result: ( maybe_unused inout x ) = { /*...*/ }
returning_func_with_ignorable_result: () -> maybe_unused _ = { /*...*/ }
{ // call site
x := 42;
inout_func( unused x );
(unused returning_func());
}
This looks nice on the call site, but I worry that on the parameter it could imply that the name is not used in the callee body, which is what Cpp1 [[maybe_unused]]
does.
Trying out a merger of the two, even more verbose on the declarations but again this is a case where verbosity can be a plus:
inout_func: ( inout x ) = { /*...*/ }
returning_func: () -> _ = { /*...*/ }
inout_func_with_ignorable_result: ( maybe_unused_result inout x ) = { /*...*/ }
returning_func_with_ignorable_result: () -> maybe_unused_result _ = { /*...*/ }
{ // call site
x := 42;
inout_func( unused_result x );
(unused_result returning_func());
}
Will think some more...
@hsutter I like the way you make a synthesis of the proposed ideas.
Looking at the last one:
inout_func: ( inout x ) = { /*...*/ }
returning_func: () -> _ = { /*...*/ }
inout_func_with_ignorable_result: ( maybe_unused_result inout x ) = { /*...*/ }
returning_func_with_ignorable_result: () -> maybe_unused_result _ = { /*...*/ }
{ // call site
x := 42;
inout_func( unused_result x );
(unused_result returning_func());
}
How will it interact with move-of-last-use?
x := 42;
inout_func( unused_result x ); // will it just suppress the move?
And when we would define a function with maybe_unused_result inout
:
x := 42;
inout_func_with_ignorable_result( x ); // will it just suppress the move?
Will it change the function's signature or add unused_result
on the call side by default? Is this good or bad?
I like the focus on the intention and would like to know if we shall support defining functions in that way. I feel comfortable with -> maybe_unused_result _
on the return side, but having that on the inout
argument feels like trying to fix some wrong design decision. Is there a use case where we use such an approach in the current cpp1 code?
But now the call site is definitely suspicious because it's making a call that is declared to modify its argument but then never looks at the argument again. Ignoring an output is usually bad, at least by default.
I like the above way of thinking, and for sure, I need to fix some cpp2 code just because cppfront complains about ignoring the return value from a function. Please note that I use the term FIX
as, after second thought, my code was just somehow broken. I don't know if providing an easy way to opt out of this rule on the definition side is a good thing.
I like the idea of being explicit when something odd is going on. Ignoring output from a function is an odd thing that you might want to do, which is why you should have the possibility to add unused_result
on the call side. That will focus the attention of the code reader.
Having the same thing on the definition side and not requiring anything on the call side will make things (from that perspective) worse, as when you read code, you don't check function definitions all the time - that might mislead the reader.
Would it be reasonable to decorate a parameter with multiple passing intentions, i.e. in_or_inout_func: ( in|inout x ) = { /*.. Suggesting that the parameter's side effect is not mandatory and therefore not worth warning when the user doesn't use it?
If we want to express that an output (parameter
out
data flow, or non-void return value) is discardable, we should have a consistent way to say that and again I would like to use the same word in both places.inout_func_with_ignorable_result: ( maybe_unused inout x ) = { /*...*/ }
This looks nice on the call site, but I worry that on the parameter it could imply that the name is not used in the callee body, which is what Cpp1
[[maybe_unused]]
does.
f: (in out? x) = { /*..
could be an alternative spelling. Although that fails to meet this:
If we want to express that an output (parameter
out
data flow, or non-void return value) is discardable, we should have a consistent way to say that and again I would like to use the same word in both places.
Alternatively, consider using the most appropriate spelling for a given context. This might be useful if consistency isn't convincing enough, and to help find a middle ground.
inout_func: ( inout x ) = { /*...*/ }
returning_func: () -> _ = { /*...*/ }
inout_func_with_ignorable_result: ( in out? x ) = { /*...*/ }
inout_func_with_ignorable_result: ( in maybe_out x ) = { /*...*/ }
returning_func_with_ignorable_result: () -> out? _ = { /*...*/ }
returning_func_with_ignorable_result: () -> maybe_unused _ = { /*...*/ }
{ // call site
x := 42;
inout_func( not out x );
inout_func( in x ); // "Force the `in`, ignore the `out`".
(unused returning_func());
(void returning_func());
}
When I thought of the alternative above, it occurred to me that
inout_func_with_ignorable_result: ( maybe_unused_result inout x ) = { /*...*/ }
is to
inout_func_with_ignorable_result: ( in maybe_out x ) = { /*...*/ }
what if (not irreversible)
is to if (reversible)
. We want to say the latter. But there's no direct way to say it. So we have to add to what was said to make it what is actually wanted.
I think out?
or maybe_out
would imply more that the callee body might or might not produce an output value. To some extent inout
already accounts for that side of things, with the intended semantics of "write to this on at least one code path."
I think what we're looking at here is the complement of that -- not whether the callee will change the argument's value to emit a new output value, but whether the caller should view the output as important vs. can safely ignore it.
Trying out "ignore"...
inout_func: ( inout x ) = { /*...*/ }
returning_func: () -> _ = { /*...*/ }
inout_func_with_ignorable_result: ( ignorable_result inout x ) = { /*...*/ }
returning_func_with_ignorable_result: () -> ignorable_result _ = { /*...*/ }
{ // call site
x := 42;
inout_func( ignore_result x );
(ignore_result returning_func());
}
Or with "output", and using a "can" prefix to avoid dealing with English verb-to-adjective conventions (e.g., wherever possible I'd like to avoid non-English speakers having to learn conventions like "ignore" -> "ignorable" to program in Cpp2)...
inout_func: ( inout x ) = { /*...*/ }
returning_func: () -> _ = { /*...*/ }
inout_func_with_ignorable_result: ( can_ignore_output inout x ) = { /*...*/ }
returning_func_with_ignorable_result: () -> can_ignore_output _ = { /*...*/ }
{ // call site
x := 42;
inout_func( ignore_output x );
(ignore_output returning_func());
}
Is there a use case where we use such an approach in the current cpp1 code?
I have the same question, is there even a use-case for this? We can try to not implement this feature now and maybe implement it later if actual use cases are encountered. Even if we do encounter an use case, I would argue that annotations are only needed at the call site and not in function definitions.
To make sure I understand, currently all function returns are converted to Cpp1 as [[nodiscard]]
, and the can_ignore_output
decoration is to suppress that?
Is the intent on the call site that the SOMETHING
has to be used like this: (SOMETHING returning_func())
as opposed to just SOMETHING returning_func()
? Is there a parsing issue that requires the parens, or is it intended as clarification for the user?
I feel comfortable with -> maybe_unusedresult on the return side, but having that on the inout argument feels like trying to fix some wrong design decision. Is there a use case where we use such an approach in the current cpp1 code?
It's the same use/bug case. A caller ignoring a return value output is a well known source of a family of security vulnerabilities: CWE-252 is a general category, and then there are more specific categories under it. It's the same bug if the caller ignores an argument output, if the function happens to choose to produce an output via a modified argument instead of (or in addition to) the return value.
For example:
malloc
returns a pointer that should be checked before use, whereas COM object allocation returns the allocated pointer via an Object**
parameter which is today's spelling for a Cpp2 out unique_ptr<Object>
parameter.std::error_code
-using functions: Many functions return an error_code
by value. Others return them via an inout
or out
parameter, such as a lot of filesystem functions like bool is_character_file( const std::filesystem::path& p, std::error_code& ec ) noexcept;
. A returned error_code
should be checked regardless of which way the function happened to return it.is_character_file
is an example of the above inout_func
whose "out" parameter result should not be ignored. But whereas we're getting better at diagnosing failure to look at the return value because of linter tools and [[nodiscard]]
, we're not yet as good at diagnosing failure to look at the output via "out" parameter.
Today we have a patchwork of narrow solutions:
[[nodiscard]]
for return values (but it can't be made the default in Cpp1, so we're adding it on the vast majority of std::
value-returning functions because it should be the default)std::ignore
for a subset of cases[[discardable]]
(if we could make [[nodiscard]]
the default)(void)
as a de facto convention for spelling "ignore this value"Cpp2 already has the right consistent automatic defaults so that we never need to write anything for the majority of cases: [[nodiscard]]
is already the automatic default for function return values, and detecting failure to use the result of an inout
or out
parameter is already the automatic default (that's what spawned this thread). So most of the time we don't need to write anything.
Now we're discussing the right consistent opt-out, aiming for a single consistent answer to avoid piecemeal patches like a [[discardable]]
here and a std::ignore
there.
To make sure I understand, currently all function returns are converted to Cpp1 as
[[nodiscard]]
, and thecan_ignore_output
decoration is to suppress that?
Yes.
Is the intent on the call site that the
SOMETHING
has to be used like this:(SOMETHING returning_func())
as opposed to justSOMETHING returning_func()
? Is there a parsing issue that requires the parens, or is it intended as clarification for the user?
Those parens are currently required because I happen to only allow argument modifiers in expression lists. Having to write (
)
around them hasn't bothered me enough yet to parse them also as prefix operators, but I could do that and then the parens would not be required around single expressions.
Those parens are currently required because I happen to only allow argument modifiers in expression lists. Having to write ( ) around them hasn't bothered me enough yet to parse them also as prefix operators, but I could do that and then the parens would not be required around single expressions.
Sounds good.
Val has a feature to discard return values of functions by assigning them to a placeholder underscore like this:
_ = returning_func();
This effectively discards the return value, but I can't think of a way to extend it to inout
arguments.
Go also does that, and I thought of mentioning that, but it also has the same issue of not being extendable to inout. I think we discussed that for returns somewhere at some point.
It's the same use/bug case.
I think what https://github.com/hsutter/cppfront/issues/231#issuecomment-1485910734 asked, I upvoted, and https://github.com/hsutter/cppfront/issues/231#issuecomment-1486202086 agreed with, was the opposite. Whether there's value in giving power to the callee to determine that an out
parameter is ignorable. And that there's value in the caller always having to opt-out. Your answer suggests that in your examples an out
argument shouldn't be ignored by default. There does not seem to be an example of an out
parameter that the caller doesn't inspect and doesn't need to opt-into ignoring it.
Granted, there's still value in the discussion, to determine a consistent opt-out for parameters and arguments. In case it's ever needed.
Ah, got it -- I see the question was just about the parameter being able to declare its output is ignorable, and whether there are use cases for such parameters. Thanks.
I suspect the pattern of the answer will be the same: Declaring 'this output is ignorable' is uncommon for return values, but in the cases where you want to declare an ignorable return value you would also want to declare an ignorable output value if the function author chose that as the path to deliver the output. But I don't have a concrete example in hand, and having one would be helpful.
I've seen many APIs with Foo **
parameters where if the argument is null
, then it's ignored and not populated, and otherwise, it sets the Foo*
to an output value. There are many of those in the Win32 API. That would to me correspond to an ignorable out parameter.
https://learn.microsoft.com/en-us/windows/win32/api/winreg/nf-winreg-regqueryvalueexw
LSTATUS RegQueryValueExW(
[in] HKEY hKey,
[in, optional] LPCWSTR lpValueName,
LPDWORD lpReserved,
[out, optional] LPDWORD lpType,
[out, optional] LPBYTE lpData,
[in, out, optional] LPDWORD lpcbData
);
I wonder if it's possible to come up with an example of an ignorable inout
/out
parameter using pointers.
An inout
maps to a reference. Supposing inout x: Foo**
works, is x* = &foo
a write to x
? I'd expect the answer to be no.
Foo**
parameters and the like are the target of C++23 "Smart pointer adaptors".
I've seen many APIs with Foo * parameters where if the argument is null, then it's ignored and not populated, and otherwise, it sets the Foo to an output value. There are many of those in the Win32 API. That would to me correspond to an ignorable out parameter.
To me, that sounds like a normal inout
argument, keep in mind that function has to write to the reference only atleast in one control path.
The only use case I could think is of a function which produces output via both parameters and return value but you just call it for one of those output (for whatever reason) and therefore you'd have to ignore the other one. For example:
func : (inout x : std::string ) -> std::string = {
x = "done";
return x;
}
main : () ={
a : std::string = "test";
std::cout << func(a);
}
Here, you're calling func
just for it's return value and passing a
to it just because you happened to have a variable you won't use again.
Same can be said the other way around, ignoring return value just because you wanted mutation on your passed argument.
In both cases, it's the callee which decides to use the outputs so I'd say it's not at all needed for the function to say that its output can be ignored, it should be upto the callee only.
I've seen many APIs with Foo * parameters where if the argument is null, then it's ignored and not populated, and otherwise, it sets the Foo to an output value. There are many of those in the Win32 API. That would to me correspond to an ignorable out parameter.
To me, that sounds like a normal inout argument, keep in mind that function has to write to the reference only atleast in one control path.
In a normal out or inout, as you must provide a valid variable. For an optional parameter, it's allowed to be null. This is more complicated, but it's an example of a large set of APIs. I don't know if that's something that we should say "you can't write this in Cpp2 because it's not safe" or if it's something that we should figure out how to support.
Oh ok, I'm not familiar with that but that still sounds like a problem related to null handling.
I will show some code demonstrating the issue we are discussing here.
t2 : type = {
x : *int;
operator=:(out this, p : *int) = {
x = p;
}
ptr1: (inout this, p : *int) -> *int = std::exchange(x, p);
ptr2: (inout this, inout p : *int) = std::swap(x, p);
}
main :() = {
n := 42;
a : t2 = (n&);
m := 24;
a.ptr1(m&); // return value can be ignored; it might be unused
pn := n&;
a.ptr2(pn); // out of pn can be ignored; it might be unused;
}
The t2::ptr1()
function is similar to the:
std::basic_streambuf<CharT, Traits>* std::basic_ios<CharT,Traits>::rdbuf( std::basic_streambuf<CharT, Traits>* sb );
(you can check it here: https://en.cppreference.com/w/cpp/io/basic_ios/rdbuf)
I did not find an example for the t2::ptr2()
case, but I will look more. I feel a little awkward about it, but it is correct (please note that all pointers here are non-owning, so we can ignore the return values safely).
ignore_output
on the call side.In this case, we can write:
main :() = {
n := 42;
a : t2 = (n&);
m := 24;
(ignore_output a.ptr1(m&);)
pn := n&;
a.ptr2(ignore_output pn);
}
I like that - we explicitly inform the code reader that this function returns something that is ignored - it might change in further code development, and it is good to see it in the place where it happens. What is good is that the code expresses what I previously put into comments - that is perfect!
can_ignore_output
on the function definition sideAs we know that we can safely ignore the return value, we can change the class to:
t2 : type = {
x : *int;
operator=:(out this, p : *int) = {
x = p;
}
ptr1: (inout this, p : *int) -> can_ignore_output *int = std::exchange(x, p);
ptr2: (inout this, can_ignore_output inout p : *int) = std::swap(x, p);
}
And then the main()
will look like the following:
main :() = {
n := 42;
a : t2 = (n&);
m := 24;
a.ptr1(m&);
pn := n&;
a.ptr2(pn);
}
It is correct and safe but misleading the code reader. Taking cpp2 defaults into account, I would assume that a.ptr1()
does not return anything as, by default, all functions are [[nodiscard]]
. Also, I would assume that a.ptr2()
is using in
passing style as it is accepted, and this is a definite-last-use of pn
, so it will be moved.
Both scenarios are correct. But Scenario 1 is more explicit and Scenario 2 is more misleading the reader.
Edit: added
inout this,
pointed out by @AbhinavK00
I've been saying that annotations will only be needed on callee side and @filipsajdak clearly shows that in his example. I think the correct way forward would be to implement scenario 1 as shown in the example.
Btw, shouldn't the member functions t2::ptr1
and t2::ptr2
have this
parameter? And can we omit return
too? (like in t2::ptr1
)
@AbhinavK00 yes, you are correct it needs this
, more specifically inout this
- I will correct it.
@AbhinavK00, the return
is added correctly by cppfront. The generated code looks like the following:
#line 1 "/Users/filipsajdak/dev/execspec/external/tests/inout_ptr_example.cpp2"
class t2 {
private: int* x;
public: explicit t2(cpp2::in<int*> p)
: x{ p }
#line 4 "/Users/filipsajdak/dev/execspec/external/tests/inout_ptr_example.cpp2"
{
}
#line 4 "/Users/filipsajdak/dev/execspec/external/tests/inout_ptr_example.cpp2"
public: auto operator=(cpp2::in<int*> p) -> t2& {
x = p;
return *this;
#line 6 "/Users/filipsajdak/dev/execspec/external/tests/inout_ptr_example.cpp2"
}
public: [[nodiscard]] auto ptr1(cpp2::in<int*> p) -> int* { return std::exchange(x, p); }
public: auto ptr2(int*& p) -> void { std::swap(x, p); }
};
auto main() -> int{
auto n {42};
t2 a {&n};
auto m {24};
CPP2_UFCS(ptr1, a, &m);// return value can be ignored; it might be unused
auto pn {&n};
CPP2_UFCS(ptr2, std::move(a), std::move(pn));// out of pn can be ignored; it might be unused;
}
Oh ok, I'm not familiar with that but that still sounds like a problem related to null handling.
LSTATUS RegQueryValueExW(
[in] HKEY hKey,
[in, optional] LPCWSTR lpValueName,
LPDWORD lpReserved,
[out, optional] LPDWORD lpType,
[out, optional] LPBYTE lpData,
[in, out, optional] LPDWORD lpcbData
);
I would say that the canonical way to write this in Cpp2 would be something like this (removing the unused reserved parameter):
RegQueryValueExW: (
hKey: int,
valueName: optional<string_view>,
out type: optional<DWORD>,
out data: optional<BYTE>,
inout cbData: optional<vector<DWORD>>
) -> LRESULT =
{
...
}
This would require that you create variables to accept the type
, data
, and cbData
values, even if you don't want them.
What if instead it were like this and this allowed you to pass null
for type
, data
, and cbData
, that would be closer to the original API. (Using OPTIONAL here as a keyword placeholder.)
RegQueryValueExW: (
hKey: int,
OPTIONAL valueName: string_view,
OPTIONAL out type: DWORD,
OPTIONAL out data: BYTE,
OPTIONAL inout cbData: vector<DWORD>
) -> LRESULT =
{
...
}
To be safe, the function would be required to check the out and inout parameters for null before using them.
I did not find an example for the
t2::ptr2()
case, but I will look more.
The std::swap
it's implemented in terms of can be an example. One way to implement the no-throw guarantee is
vector<T> new_v;
try { /*fill new_v*/; } catch(...) /*...*/
this->swap(new_v);
// `new_v` unused, even though the signature could be, in C++2, `friend swap(inout, inout)`.
That said, I don't think these are examples of where a function author might want to make the parameters' out
-part ignorable. Whether any argument's out
-part is ignored better remains explicit on the call site.
Thanks to this discussion and that I have write more cpp2 code I have noticed that UFCS disables [[nodiscard]]
-> #305
Another example things brings to mind, from my SQL and ODBC days, is SUCCESS_WITH_INFO
.
This is a return value that means "the operation succeeded, and by the way there's additional information available in case you want to look." Sometimes it meant this was was a warning rather than an error so you should still look, but sometimes it was really just advisory extra information that happened to cost nothing extra to compute so it was made available to the caller.
If you got that SUCCESS_WITH_INFO
return value, you would additionally look elsewhere, at at an inout
/out
parameter or call a second function, to get the additional information. In the use cases I was familiar with, generally you didn't need to look, and it was considered optional to look.
If that API were:
ODBC32::RETCODE my_function( /*...*/, AdditionalInfo& info);
and you called it like this:
if (my_function( /*...*/, info) != ODBC32::ERROR) {
// do stuff, but don't necessarily look at info
}
that would generally be fine (IIRC) even though we don't look at the extra info.
So that info
parameter would be a can_ignore_output inout
.
But it has been 25 years since I've done ODBC programming... and I tend to agree with @filipsajdak and others that a simpler option to try first could be to allow only the call-side opt-out for now, and then see whether there's demand for a discardable return value or discardable out parameter result. (Well, we know there are cases, certainly for discardable return values, but the question is whether it's necessary to declare them as such, or whether they're infrequent enough that the level of noisiness at the call sites is acceptable.)
Still thinking it through... thanks for all the comments.
BTW the main difference between a return value and an inout
/out
parameter is that the former is "callee-allocated out" (the callee creates the object and passes it back) and the latter is "caller-allocated out" (the caller provides an existing object that has storage, possibly initialized, to write to). But both are equally "out". Just sharing that viewpoint because maybe it will help explain why I treat them as equivalent for "out" data-flow purposes.
Another example things brings to mind, from my SQL and ODBC days, is
SUCCESS_WITH_INFO
.This is a return value that means "the operation succeeded, and by the way there's additional information available in case you want to look." Sometimes it meant this was was a warning rather than an error so you should still look, but sometimes it was really just advisory extra information that happened to cost nothing extra to compute so it was made available to the caller.
That seems like the std::ranges
result types, but split between the result and an inout
parameter.
With Cpp2, a better API for SUCCESS_WITH_INFO
would be return types to aggregate the incidental computations, and non-discardable inout
/out
parameters for must-inspect results. Aggregating computations in the result relieves callers from having to allocate out.
Arguably, an API that forces the inspection of must-inspect results, without the possibility of passing discarded out
arguments, is also possible. A bit unconventional, but by requiring a callable with parameters for the must-inspect results, the caller is also relieved from having to allocate out those for arguments.
EDIT: This probably doesn't mix well with coroutines or anything that becomes harder due to the introduction of a different function body for the continuation. In those cases, "throwing values" can do better, if it's a feasible feature for a particular case.
I just had a bit of time in my hands. This might not be relevant.
After sleeping that over, I understand what worries me the most.
In cpp1, there is no way to differentiate out
and inout
passing styles. That means we mix things up when bringing cpp1 examples to the picture. I have a feeling that most of the examples were examples that in cpp2 match more out
passing style than inout
.
out
& inout
passing styles that might use ignoreI was thinking about good examples of these cases. Those are excellent examples we could have with cpp1 streams:
fun1: ( out o : std::ostream ) = { /*...*/} // function just write to the stream
fun2: ( in o : std::istream ) = { /*...*/} // function just read from the stream
fun3: (inout o : std::iostream) = { /*...*/} // function read and write to the stream
The above functions can be called with file streams the output result is observable on the filesystem. In some cases, out
or inout
results can be ignored, but I have doubts if marking the function that you can ignore the results by default will be a good solution. Still, you should check the stream's state at the end.
The only scenario that I imagine you might safely mark an out
or inout
argument with can_ignore_output
(in the case of streams) is when your function uses some error handling that will cover all error cases: e.g., exceptions.
Currently, the above code will only work as some read & write methods are non-const as they might change the object's state. operator<<
or operator>>
accept stream by non-const reference - as there is no possibility to express out
, in
, or inout
that was the only option.
What I would like to express in the code is:
out
, the function can only write to the argument,in
, the function can only read from the argument,inout
, the function can read and write to the argument,Unfortunately, there are methods like good()
that function shall use despite passing style (the issue is with out
assuming that it means no read from variable). Maybe there should be a way to express that a specific method can be called when an object is passed with an in
(default), out
, or inout
passing style. Similarly, we marked methods with const
or mutable
in cpp1.
inout
args use casesI was looking for examples where I am passing something to function, and I read from it and expect results in the same object. There are some cases:
I am struggling with convincing myself if and when it is safe to use can_ignore_output
on functions that take them as the argument.
can_ignore_output out arg
in cpp1I was looking for some example of out
example, and I found one that is at the same time can_ignore_output out
:
int QString::toInt(bool *ok = nullptr, int base = 10) const
https://doc.qt.io/qt-6/qstring.html#toInt
ok
is an out
argument that is optional - you opt out by passing nullptr
. This is an excellent example of how we deal with the discussed topic in cpp1.
There are use cases when you might want to mark an out
or inout
argument with z can_ignore_output
(e.g., by using exceptions) - std::fstream
and std::ofstream
are good examples of types that might need that.
I would look for a more strict way of declaring argument as in
, out
, and inout
with a possibility to mark methods in my UDT as allowed to be called in specific passing style context.
fun1: ( out o : std::ostream ) = { /*...*/} // function just write to the stream
fun2: ( in o : std::istream ) = { /*...*/} // function just read from the stream
fun3: (inout o : std::iostream) = { /*...*/} // function read and write to the stream
~I think this is conflating two totally orthogonal ideas, the ability to read from or write to the stream, which is identified by the type:~
fun1: (o : std::ostream ) = { /*...*/} // function just write to the stream
fun2: (o : std::istream ) = { /*...*/} // function just read from the stream
fun3: (o : std::iostream) = { /*...*/} // function read and write to the stream
~and the ability to create the objects themselves:~
fun1: ( out o : std::ostream) = { /*...*/} // function creates the ostream
fun2: ( in o : std::ostream) = { /*...*/} // function uses the pre-created ostream
fun3: (inout o : std::ostream) = { /*...*/} // function can use the pre-created ostream if it exists and create it and pass it back to the caller if it doesn't
I'm not sure I'm actually interpreting these modifiers properly, need to go back to the parameter passing papers and look at these again.
Update: I think I've been in C# land too much recently, and was seeing these as pointers rather than objects. I watched the 2020 presentation and I think I have my head back on straight again.
I'm not sure I'm actually interpreting these modifiers properly, need to go back to the parameter passing papers and look at these again.
I agree! I'm in dire need of documentation.
Example of
can_ignore_output out arg
in cpp1I was looking for some example of
out
example, and I found one that is at the same timecan_ignore_output out
:int QString::toInt(bool *ok = nullptr, int base = 10) const
https://doc.qt.io/qt-6/qstring.html#toInt
ok
is anout
argument that is optional - you opt out by passingnullptr
. This is an excellent example of how we deal with the discussed topic in cpp1.
out
parameters can also accept uninitialized arguments. All the function wants to do is read ok
, and if it's not null, write to ok*
. So ok
should be in
or inout
. I think in
would suffice; I guess writing to ok*
doesn't require ok
to be inout
.
@JohelEGP I presented the toInt()
method as an example of how we deal with ignorable output arguments in cpp1.
I believe in cpp2 we could write it as:
QString: type = {
toInt: (this, can_ignore_output out ok: bool) -> int = {
//...
}
}
Please note that ok
is not a cpp2 pointer (it will become one on the cpp1 side after cppfront generation).
Hmm... What about the part in C++1 that ok
defaults to nullptr
? That wasn't translated in the C++2 rewrite. In that C++2 interface, an argument is required for ok
, whereas it can be omitted in the C++1 version.
It depends on how you generate the code for can_ignore_output out
I'd expect can_ignore_output
to be orthogonal to defaulting parameters. I think the translated interface would look more like toInt: (this, can_ignore_output in ok: *bool = nullptr) -> int
, if that's how defaulting parameters work. As you see, ok
isn't out
or inout
. The output is trough ok*
if ok
is not null. And parameter styles and can_ignore_output
apply to the (type of the) parameter.
This is similar to issues about pointers being shallow const
.
Thanks Filip,
I have a feeling that most of the examples were examples that in cpp2 match more
out
passing style thaninout
.
And those should rarely want to allow ignoring the result, because out
can be a constructor... and ignoring the result would mean asking for construction and then not using the object, and relying only on the side effects.
Except for an RAII object, where the intended use is to construct the object as an automatic storage duration object and typically never interact with it, since its purpose is to run its destructor at the end of scope. So this would be totally fine:
{
// let's say this guard is declared uninitialized because we might need to
// initialize it later...
guard: my_raii;
// ... say because we have to decide which alternate constructor to use...
if something() {
guard = create_guard( options, here );
}
else {
initialization_function( out guard ); // note it's okay to ignore this particular 'out' parameter
}
// ... long function body with early returns etc. ...
} // ~guard executed on all paths
So perhaps that's another good example of ignoring the output value (until the dtor)?
What I would like to express in the code is:
- When I use
out
, the function can only write to the argument,
For out
I'd say "the function must write to the argument using =
before any other use'... later reading the value that the function itself wrote is totally fine, and may be needed to do multi-step initialization of it before finally handing it back to the caller in the desired state.
- When I use
in
, the function can only read from the argument,- When I use
inout
, the function can read and write to the argument,
Those two are the status quo, right?
RAII objects are an excellent example! This is also an example of an out
variable, but there is also an inout
case:
{
guard: my_raii;
if something() {
guard = create_guard( options, here );
}
else {
initialization_function( out guard );
}
use_and_modify( guard ); // note it's okay to ignore this particular 'inout' parameter
} // ~guard executed on all paths
And I also realize that this is a more generic example than those with standard streams (they are done in the RAII
way).
I will write more later during the day.
While we're discussing what the modifiers are supposed to express, in the 708 repo someone mentioned diagnosing "I said inout
but I only wrote to it" and saying that it should recommend changing to out
.
Yes, also the diagnostics will be provided to use in
when using inout
and only reading from the argument.
I have mixed feelings about the out
meaning. Currently, it means write to
but also initialize first
.
There is a quote in 708:
“Finally, the [internal January 1981] memo introduces writeonly: ‘There is the type operator writeonly, which is used like readonly, but prevents reading rather than writing...’ “ —B. Stroustrup (D&E, p. 90)
And I missed exactly that. I would like to express that function will only write to the argument.
Use cases:
std::ostream
And there could be diagnostics: suggesting using inout
when you used out
but you also read from the argument.
The current meaning of out
is rather “initialize before reading or writing”.
In case of passing UDT as out
,
as far as I understand it correctly, you need to first call a method that has out this
before you will be able to call any other method or do anything else with the object, right?
So, maybe I am wrong. Maybe I just need to define methods in my UDT as out this
to be able to use my type as out argument?
E.g., for ostream
I can define methods for writing to stream as out
, right? But doesn't that change the meaning of the out this
in methods?
Musing a bit more about these modifiers based on preconditions and what you must and must not do. Are these descriptions accurate, sufficient?
in
:
in this
functions or read data membersout
:
out this
function or write all data membersnot out
inout
:
should be out
in this
or inout this
function or read a data membershould be in
.
out this
or inout this
function or write a data memberI see also a lack of consistency (or symmetry):
in
argument we can call in this
methods.inout
argument we can call in this
, out this
, and inout this
methods.out
argument we need to first call out this
method and after that we can call methods:
in this
,out this
, and inout this
.Maybe, what we have in the third point is better expressed with the following:
fun: (first_init in x) = {} // case 3.1
fun: (first_init out x) = {} // case 3.2
fun: (first_init inout x) = {} // case 3.3
Then we can have a rule that on the out
argument you can call only out
methods.
Maybe, what we have in the third point is better expressed with the following:
I deleted this from my message above, but I originally had:
in
: can only call in this
functions.
out
: must initialize and then you can do whatever you want.
inout
: you can do whatever you want.
I don't see a benefit to out
meaning "you can only call out this
". I imagine that the number of objects that are truly non-readable is almost non-existent. Even std::ostream
has bool good() const
.
In case of passing UDT as
out
, as far as I understand it correctly, you need to first call a method that hasout this
before you will be able to call any other method or do anything else with the object, right?
That's right. Usually, that'd be operator=
.
The current meaning of
out
is rather “initialize before reading or writing”.
Not really "initialize". It can be already initialized. You can pass initialized arguments to out
parameters.
E.g., for
ostream
I can define methods for writing to stream asout
, right? But doesn't that change the meaning of theout this
in methods?
Since out
parameters can be uninitialized, what'd you do if you were given an uninitialized std::ostream
?
And I missed exactly that. I would like to express that function will only write to the argument.
So, maybe I am wrong. Maybe I just need to define methods in my UDT as
out this
to be able to use my type as out argument?
I think you want a metaclass write_only_reference_wrapper
that only permits in
and inout
uses of the wrapped object.
And there could be diagnostics: suggesting using
inout
when you usedout
but you also read from the argument.
IIRC, it's supposed to diagnose to first apply =
to the out
parameter, or pass it as an out
argument, before reading from it. Since not all object parameters of =
s are out
, that's not exactly right.
Use cases:
* `std::ostream` * write only memory, * pipes,
Your previous example of standard streams included reading the error status of a stream. You wouldn't be able to do that with writeonly
. With memory, you wouldn't be able to query its space left.
Arguably, with sufficient indirections, writeonly
could be handy. With memory, you can pass it to a writeonly
that does the write part once the caller, who has inout
access, has asserted having enough capacity. Streaming operator<<
>>
are not such an example. Those build sentinels which require in
or inout
access.
Then we can have a rule that on the
out
argument you can call onlyout
methods.
That's not as useful as it can be. Remember:
For
out
I'd say "the function must write to the argument using=
before any other use'... later reading the value that the function itself wrote is totally fine, and may be needed to do multi-step initialization of it before finally handing it back to the caller in the desired state. -- https://github.com/hsutter/cppfront/issues/231#issuecomment-1489600460I think you want a metaclass
write_only_reference_wrapper
that only permitsin
andinout
uses of the wrapped object.
Actually, a metaclass or writeonly
parameter passing wouldn't work without C++1 reflection. There's just no way to prohibit const
uses of a type or object.
In the current implementation of cppfront, the following code:
Passes cppfront:
But failed to compile on the cpp1 side with an error:
When cppfront moves
x
on its last use it breaks the requirements of thef2
function that requires an lvalue reference but gets an rvalue reference.Expectations
cppfront should take into consideration the context in which the variable is used to avoid breaking the code on the cpp1 side.