j3-fortran / fortran_proposals

Proposals for the Fortran Standard Committee
178 stars 15 forks source link

support allocatable CHARACTER variables on GET_COMMAND() and GET_COMMAND_ARGUMENT() #178

Closed urbanjost closed 2 years ago

urbanjost commented 4 years ago

prop.txt

Since the strings being fetched via the GET_COMMAND() and GET_COMMAND_ARGUMENT() procedures are of indeterminate length and provided at run time a simple useful change to the intrinsics would be support of allocatable CHARACTER variables. The most recent example in the attachment limits itself to only allocating unallocated allocatable variables per the topic discussions about how allocating an allocated variable could affect existing code, as the intrinsics allow allocated allocatable CHARACTER variables but do not reallocate them to the required length but return error -1 if their current length is exceeded.

certik commented 4 years ago

Indeed (and the same with read, but that can be discussed in another issue).

Until this gets standardized, the subroutines from your proposal should go into stdlib. CC @milancurcic.

klausler commented 4 years ago

ALLOCATABLE CHARACTER arguments are already allowed, of course. What I think is needed is the ability to pass an unallocated ALLOCATABLE CHARACTER. That's currently not allowed, so extending the standard to support it would not invalidate any existing program (unlike the feature that J3 discussed at the February meeting that would reallocate an ALLOCATABLE CHARACTER entity if one were used for IOMSG= and other specifiers...)

certik commented 4 years ago

@klausler yes, if you read the prop.txt proposal above, it is about allowing to pass unallocated allocatable character and for them to be automatically allocated to the correct length.

urbanjost commented 4 years ago

Sorry for the confusion. You are both right. I changed the overload procedures to deallocate instead of use the pre-allocated string because I could not think of a use-case where I would not want to re-allocate. Now I see several reasons to put it back the way it was, including possibly causing an undetected truncation in existing code that uses the variable returned assuming it did not potentially change length, etc. I will put it back and look at hardening the example (which I just meant for demonstration purposes) for inclusion in stdlib. Of course that means putting a second line in the user code to ensure the variable is deallocated before the call in the general case, but off the cuff I can't think of an overload that would not require it without the additional of an additional parameter.

urbanjost commented 4 years ago

I updated the attachment and added a unit test and an optional argument REALLOC as a flag to reallocate an already allocated variable.

sblionel commented 4 years ago

https://j3-fortran.org/doc/year/19/19-252r2.txt describes the feature asked for in the initial post, applying to all intrinsic procedures that have an INTENT(OUT) or INTENT(INOUT) argument where the value returned is "as if by intrinsic assignment", This handles both the unallocated and currently allocated cases.

urbanjost commented 4 years ago

Given the scope the body implies that really needs a new title. It covers more than "Specification for auto-allocating processor messages". I was considering the same scope as that document implies, but wanted to initially be specific enough that issues like returning an array and elemental functions could be discussed with a smaller scope first but I see some of those issues covered. So what would that imply for NAMELIST output and for internal WRITE statements with / descriptors in the formats? Would an explicit / in a format imply an array on output and so be disallowed? Would you be guaranteed a single output line for an internal WRITE of a NAMELIST group? (that varied last time I looked from compiler to compiler. I believe it was the IBM compiler that required at least three elements and an array for an internal write of a NAMLIST group although I do not have access to that to verify it currently).

urbanjost commented 4 years ago

Digesting that a little more if a parameter is explicitly INTENT(OUT) it seems it should aleays be allocated on return instead of left unchanged, and I would argue that for any unallocated allocatable scalar, whether INTENT(INOUT) or INTENT(OUT), or you would likely have to test if the variable was unallocated upon return or always have and check a status variable to see if the return value of the status variable indicated no error and thus no message for options such as IOMSG for messages.

sblionel commented 2 years ago

This is already in F2023 - not just for these two procedures but for all intrinsic subroutines with character output arguments. The introduction says, "When a deferred-length allocatable actual argument of an intrinsic procedure is to be assigned character data, it is allocated by the processor to the length of the data."

klausler commented 2 years ago

Be advised, in the case of allocatable deferred-length character variables that are already in the allocated state, this feature in 202X is a silently incompatible change of existing behavior.

sblionel commented 2 years ago

That was known, but considered to be the preferable behavior.

FortranFan commented 2 years ago

@sblionel wrote April 23, 2022, 6:31 PM EDT:

That was known, but considered to be the preferable behavior.

May be, but that does not make it right, not in the least bit. It's a real travesty how the standard holds on to certain patently unsafe aspects in the language in the name of backward compatibility but in other situations, like with the feature in 202X, an incompatible change is introduced on account of an utterly silent voting bloc. The practitioners of Fortran deserve a far better workflow on such decisions.

sblionel commented 2 years ago

@FortranFan I find your post puzzling. This feature was put on the WG5 list in 2019 and it was first discussed in 2018 (https://j3-fortran.org/doc/year/18/18-279r1.txt) I have no clue what you mean by "utterly silent voting bloc". It was discussed across at least three J3 meetings. It was also a popular feature in the survey I ran 2017 into 2018.

The consensus was that 1) allocatable deferred length character was a relatively new feature, 2) the old behavior was not what people wanted and required extra code to work around, 3) any existing workarounds would continue to work.

You're right that the committee is reluctant to change behaviors that would break existing programs. We felt that this case did not do that.

We do have protocols for eliminating unsafe practices, and have done so multiple times over the years.

FortranFan commented 2 years ago

@sblionel writes Apr 23, 2022, 8:14 PM EDT:

I find your post puzzling. This feature was put on the WG5 list in 2019 and it was first discussed in 2018 (https://j3-fortran.org/doc/year/18/18-279r1.txt) I have no clue what you mean by "utterly silent voting bloc". It was discussed across at least three J3 meetings. It was also a popular feature in the survey I ran 2017 into 2018.

I think you know exactly what is going on here.

The point of contention is the specific change related to the length parameter of the object. Readers can consider the following example to better understand the issue:

   character(len=:), allocatable :: cmd
   allocate( character(len=20) :: cmd )
   call get_command( cmd )
   print *, len(cmd)
end

A processor conformant to current standard will allow the above program to output 20:

C:\temp>gfortran Example.f90 -o Example.exe

C:\temp>Example.exe 20

C:\temp>ifort /standard-semantics Example.f90 Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.5.0 Build 20211109_000000 Copyright (C) 1985-2021 Intel Corporation. All rights reserved.

Microsoft (R) Incremental Linker Version 14.31.31105.0 Copyright (C) Microsoft Corporation. All rights reserved.

-out:Example.exe -subsystem:console Example.obj

C:\temp>Example.exe 20

Following Fortran 202X though, a conformant processor will allow the program to output 11 or some such processor-dependent value which is different from what the current standard informs the user.

This is an unacceptable change given the constant refrain of backward compatibility.

And there was inadequate discussion in the committee on this specific incompatible change that got introduced by how the feature was specified by the J3 subgroup.

Also the argument, "allocatable deferred length character was a relatively new feature," was debatable because the "allocatable deferred length character" feature and the current behavior with GET_COMMAND (e.g., output of 20 above) and other such intrinsics got introduced starting Fortran 2003 published about 18 years ago.

I believe an overwhelming majority of users find the overall item "US14. Allow deferred-length character variables in more locations" on the Fortran 202X worklist to be a good addition. I do too. That is not the argument here. The issue is the change in behavior when it comes to already allocated allocatable character type argument,

urbanjost commented 2 years ago

I have not re-read it recently, but the last time i looked the proposal is that GET_COMMAND will only allocate the variable if it is not already allocated, so it would still return 20. If would only automatically allocate an unallocated variable. Is that still not the case? When I get some time I will reread, but that would keep it upward-compatible.

Ahh, it did change. I would have preferred an option on GET_COMMAND to allow re-allocation as a compromise solution myself, but that is the preferred behavior enough for me that I have been using a wrapper around GET_COMMAND for a long time that does what the proposed change is, so it would not personally affect me; but I am very surprised. Somewhere along the line I think I proposed something like adding a "REALLOC=.TRUE." optional argument because checking and deallocating would be ugly. In practice, how many people are using an allocatable variable with GET_COMMAND where they have not queried the length with a previous GET_COMMAND call? In the example, it seems unlikely someone would not use a fixed length variable if it was always 20, and if allocatable have set the length with a query of the length first, in which case the change just eliminates the first call to GET_COMMAND, or if the first call is left in the same length will be returned anyway.
I

sblionel commented 2 years ago

The text in 22-007r1, referring to all intrinsic procedures, is this:

"When an allocatable deferred-length character scalar corresponding to an INTENT (INOUT) or INTENT (OUT) argument is assigned a value, the value is assigned as if by intrinsic assignment." (16.9.1p3) This is simple and easy to understand.

urbanjost commented 2 years ago

Might be. Not sure what it means in English :>. I don't think that is as simple as other schemes, or only marginally simpler. Now since most intrinsics are elemental the programmer has to remember that passing an allocatable scalar will behave differently than an array, that character variables behave differently than numeric with regard to allocatable variable,s .... Somethings simple would have been "all intrinsics return allocatable variables allocated". Makes sense for OUT arguments; is easier to understand for functions than subroutines. For the many versions of GETARG out there that are functions, something like A=GETARG(1) is clear that the LHS is returned first, and then assigned to A. With subroutines this is another rule with inconsistencies to remember.

sblionel commented 2 years ago

Note that it says scalar, so elemental doesn't apply. Besides, there are no elemental intrinsics with character OUT or INOUT arguments. "As if by intrinsic assignment" is used frequently in the standard, and the explanation of what that does is straightforward.

urbanjost commented 2 years ago

That is what I meant as one of the inconsistencies, that scalar is specifically specified. No intrinsic currently is CHARACTER and elemental but this hopefully some will be in the future. I know I have my own wrapper around get_arguments that returns a character array and a length array; not quite elemental but it returns an array. I have a lot of string functions that are elemental, and hopefully the intrinsics will someday provide some of that functionality. In this particular case I am OK with the change personally, but as a general behavior that can be applied going forward, and because it causes an incompatible change with previous behavior I have some misgivings. Not being backward compatible if very "un-Fortranic". Personally, I only have a few small programs that call the intrinsics directly, as I almost always call something that takes Unix-like syntax like M_kracken or M_CLI2 or reads the arguments as a NAMELIST, and this change will let me simplify those interfaces (they all call GET_ARGUMENTS(1) once to get the length, and then allocate a variable, this change will make that simpler and cleaner).

FortranFan commented 2 years ago

Note that it says scalar, so elemental doesn't apply. Besides, there are no elemental intrinsics with character OUT or INOUT arguments. "As if by intrinsic assignment" is used frequently in the standard, and the explanation of what that does is straightforward.

The issue is not at all whether the revision as drafted toward Fortran 202X publication is straightforward or simple or useful, etc. These are all beside the point.

The problem is as described by @urbanjost , "because it causes an incompatible change with previous behavior I have some misgivings. Not being backward compatible if very "un-Fortranic""