New Scan Intrinsics - Githubissues

This an attempt at #273.

Open questions:

[x] How to spell the name? (Straw vote for plenary)
[ ] Should the initial implementation be restricted to 1D arrays?
- This would eliminate the DIM argument and make it easier to describe.
- Behavior for other ranks is not difficult to implement outside of scan (as illustrated by the descriptions in the first draft)
- Support for additional ranks could be added later without breaking backwards compatibility
[ ] Should the initial implementation have an optional argument to perform a segmented scan?
- There are different ways of specifying the segments that need to be considered in designing the interface
- Support for it could be added later without breaking backwards compatibility

Awesome, thanks @everythingfunctional !

Re: the spelling of the name, what is your preference and those who have requested for it?

I'd like to suggest following how HPF did it - with changes for taste:

1.) Spelling: HPF naming had the operation (e.g., SUM), followed by the scan type (e.g., PREFIX or SUFFIX) to form a procedure name like SUM_PREFIX or PRODUCT_SUFFIX. My vote would be to use the word "SCAN" first, then the operator type - e.g., SCAN_SUM or SCANEX_PRODUCT. Then they will all appear together in alphabetical order.

Note that since the operation is part of the name, no user-provided function is necessary. This avoids the performance issue of having to repeatedly call the user-provided function. IMHO there should be built-in (no user-provided function needed) support for at least sum, product, min, and max. Perhaps at least initially, others could be done via user-provided function.

Another advantage of splitting the names out by operation and scan type is that better compile time checking of the arguments can be done. For example in HPF, sum and product scans can operate on array types of integer, real, and complex. Min and max scans are more limited to integer and real arrays. There are others that only apply to integers and to logicals. Accepted arguments also vary. For example, HPF ALL_SCAN only accepts a logical array and uses the MASK argument as input. See section 7.4.5 of the HPF 2.0 report for a better summary. The MPI library has similar restrictions on data type vs supported operations. See section 6.9.2 of the MPI 4.0 report as an example.

2.) I wouldn't have a problem starting with 1D arrays. The DIM argument complicates the implementation a lot. Get 1D working well, then think about adding the DIM argument.

3.) Segmented scans (e.g., via a logical SEGMENT argument) would be useful. Again, check out how HPF did it. Blelloch's book "Vector Models for Data Parallel Computing" makes extensive use of segmented scans in many of his algorithms.

Walter

-----Original Message----- From: j3-fortran/fortran_proposals @.> Sent: Jan 10, 2023 2:05 PM To: j3-fortran/fortran_proposals @.> Cc: Subscribed @.***> Subject: [j3-fortran/fortran_proposals] New Scan Intrinsics (PR #290)

This an attempt at #273 (https://github.com/j3-fortran/fortran_proposals/issues/273). Open questions:

How to spell the name? (Straw vote for plenary)
Should the initial implementation be restricted to 1D arrays? * This would eliminate the DIM argument and make it easier to describe.
Behavior for other ranks is not difficult to implement outside of scan (as illustrated by the descriptions in the first draft)
Support for additional ranks could be added later without breaking backwards compatibility
Should the initial implementation have an optional argument to perform a segmented scan (https://en.wikipedia.org/wiki/Segmented_scan)? * There are different ways of specifying the segments that need to be considered in designing the interface
Support for it could be added later without breaking backwards compatibility

You can view, comment on, or merge this pull request online at: https://github.com/j3-fortran/fortran_proposals/pull/290 Commit Summary * 78dd77b (https://github.com/j3-fortran/fortran_proposals/pull/290/commits/78dd77b2a65377a8789a308a3db77a2deb1b5c06) first draft of new scan intrinsics

File Changes(1 file (https://github.com/j3-fortran/fortran_proposals/pull/290/files))

A proposals/scan/scan.txt (https://github.com/j3-fortran/fortran_proposals/pull/290/files#diff-7b6cd37fe8090fa3d35d533c7c1553bfd053b1c0f05b754096008ab8ea9db74a) (364)

Patch Links: * https://github.com/j3-fortran/fortran_proposals/pull/290.patch

https://github.com/j3-fortran/fortran_proposals/pull/290.diff

— Reply to this email directly, view it on GitHub (https://github.com/j3-fortran/fortran_proposals/pull/290), or unsubscribe (https://github.com/notifications/unsubscribe-auth/ABCNQO6VQU3F5CF5KAHERPLWRXMJFANCNFSM6AAAAAATXJ7GAQ). You are receiving this because you are subscribed to this thread.Message ID: @.***>

I probably missed it, but I didn't see a way for exclusive prefix scans to specify the identity values of their operations, which are needed to define the first value in their result sequence. Your examples return 1 for the exclusive scan with MY_MULT but I don't see how the language could know that 1 is its identity.

The scan_exclusive procedure takes an additional argument (IDENTITY). The example provides that as an argument.

Re: the spelling of the name, what is your preference and those who have requested for it?

The preference is scan, but of course that's taken and is ambiguous in terms of is it inclusive or exclusive (unless of course we have an optional argument to select between the two, which is I suppose an possibility).

@w6ws , thanks for the feedback. I have had one or two others suggest looking at HPF for inspiration. I think the operator specific variants do have potential for better performance, but I'm not sure they really are any more compile-time/type safe. The compiler is already obligated to check the type of the array argument vs the operation provided, so I don't think there's chance of a user getting it wrong. I'm not opposed to adding the operator specific variants, but perhaps as a next iteration/separate paper. I'm leaning towards restricting to 1D to start simpler and not including segmented, again to keep things simpler.

I reviewed my copy of Blellochs book. In it he uses SUM, MIN, MAX, AND (ALL), and OR (ANY) scans. Does not use PRODUCT. So I would suggest having built-in operations for at least those first five operators - rather than requiring a user-supplied function. Both HPF and MPI define additional built-in operators for scans. And as I previously mentioned, Blelloch makes extensive use of segmented scans.

Again, by having separate procedure names for each of the operators, the compiler can easily check at compile time for proper argument types. SUM would be overloaded to accept arrays of integer, real, and complex. MIN and MAX would accept integer and real. AND and OR would accept logicals only. In HPF these can all be verified at compile time. In MPI, they can't - leading to an opening for potential coding errors.

As for the user-supplied function, if you really think you need it I'd suggest it be ELEMENTAL rather than PURE. ELEMENTAL is, of course, a much more restricted version of pure. That would considerably simplify the documentation of the argument.

Walter

-----Original Message----- From: j3-fortran/fortran_proposals @.>@.> Sent: Jan 10, 2023 6:16 PM To: j3-fortran/fortran_proposals @.>@.> Cc: Walter Spector @.>, Mention @.>@.>@.> Subject: Re: [j3-fortran/fortran_proposals] New Scan Intrinsics (PR #290)

I probably missed it, but I didn't see a way for exclusive prefix scans to specify the identity values of their operations, which are needed to define the first value in their result sequence. Your examples return 1 for the exclusive scan with MY_MULT but I don't see how the language could know that 1 is its identity. The scan_exclusive procedure takes an additional argument (IDENTITY). The example provides that as an argument. Re: the spelling of the name, what is your preference and those who have requested for it? The preference is scan, but of course that's taken and is ambiguous in terms of is it inclusive or exclusive (unless of course we have an optional argument to select between the two, which is I suppose an possibility). @w6ws (https://github.com/w6ws) , thanks for the feedback. I have had one or two others suggest looking at HPF for inspiration. I think the operator specific variants do have potential for better performance, but I'm not sure they really are any more compile-time/type safe. The compiler is already obligated to check the type of the array argument vs the operation provided, so I don't think there's chance of a user getting it wrong. I'm not opposed to adding the operator specific variants, but perhaps as a next iteration/separate paper. I'm leaning towards restricting to 1D to start simpler and not including segmented, again to keep things simpler. — Reply to this email directly, view it on GitHub (), or unsubscribe (https://github.com/notifications/unsubscribe-auth/ABCNQO3BFQP7HSW4RBQFWO3WRYJWZANCNFSM6AAAAAATXJ7GAQ (https://github.com/j3-fortran/fortran_proposals/pull/290#issuecomment-1378156429)). You are receiving this because you were mentioned.Message ID:

First, thank you all for your comments and suggestions. They have revealed some aspects that I had not initially considered. I would like to try and address some of the various comments.

The most common suggestion has amounted to something like, "Why not just do exactly what HPF did?"

Looking at what HPF did has been very valuable as a reference. I can certainly understand vendors wanting to be able to reuse existing work, but I want to still explore whether HPF necessarily did it perfectly, or whether we could potentially do better now. If the way we choose to do it now is slightly different than HPF, it doesn't mean that prior work is not reusable, just that it may require some modification.

HPF provided the combinatorial set of operations with {PRE,SUF}FIX specific procedures, with optional argument to determine inclusive or exclusive. With the recent sentiments expressed that all new features should have compelling use cases, there is much more burden to justifying each one individually. By providing a generic procedure that is applicable for all operations, only a few use cases are needed to justify it, including use cases not covered by the operation specific versions.

It has been suggested that the operation specific versions mean that the code can be type checked, but the compiler absolutely could type check the generic version. It is required the ARRAY and OPERATION have the same types, which the compiler can see at the call site and enforce. This is just like the generic REDUCE function. I will note that this does mean COUNT_{PRE,SUF}FIX is not possible with the generic version. A slight change to the description and arguments could re-enable it though. I.e.

PURE FUNCTION OPERATION(x, y) RESULT(res) TYPE(), INTENT(in) :: x TYPE(), INTENT(in) :: y TYPE() :: res END FUNCTION

and make IDENTITY a required argument.

HPF provided {PRE/SUF}FIX functions, with an optional argument to do inclusive vs exclusive. It would be reasonable to do the opposite arrangement, have {IN/EX}CLUSIVE functions with an optional argument to do forward vs backward iteration. Or even just single functions with optional arguments for both, or individual functions for each. Each option has certain advantages and disadvantages that should be considered.

HPF defined that the result of a {PRE/SUF}FIX function has the same shape as the array argument, regardless of the presence of a MASK argument, and that elements of the result for which no elements of the input contribute there is a "default" value with which that element is defined. This works for the HPF functions because all the specified operations either have a meaningful "default" value, or in the case of COPY, don't allow a mask or exclusive argument (i.e. no chance of zero elements contributing). This is not necessarily possible in the generic case. The generic REDUCE function overcomes this with an IDENTITY argument. My initial thought though was that the behavior would be more like SUM_PREFIX(ARRAY, MASK) == SUM_PREFIX(PACK(ARRAY, MASK)). I'm not sure I know enough about various use cases to decide which is more appropriate. I'm open to suggestions here. I'll just note that I believe most other intrinsics with a MASK argument do follow this pattern (or a similar pattern with loops over dimensions other than DIM). For example

res => MINLOC(ARRAY, MASK) res(s1, ..., sdim-1, :, sdim+1, ..., sn) == MINLOC(PACK(ARRAY(s1, ..., sdim-1, :, sdim+1, ..., sn), MASK(s1, ..., sdim-1, :, sdim+1, ..., sn)))

HPF treated the SEGMENT argument as adjacent elements with .EQV. corresponding elements in MASK being members of the same segment. I.e. SUM_PREFIX([1, 2, 3, 4, 5], MASK=[.true., .true., .false., .true., .true.]) == [1, 3, 3, 4, 9]. However, many implementations in other languages treat .true. values in the SEGMENT argument as signifying the first element in a SEGMENT. I.e. SUM_PREFIX([1, 2, 3, 4, 5], MASK=[.true., .false., .true., .true., .false.]) == [1, 3, 3, 4, 9]. Is it better to be consistent with HPF or with other languages?

HPF did not provide collective subroutine versions for any of these operations. Depending on the chose design, what operations should we provide collective subroutines for?

For all of these various dimensions for the possible design of this feature I of course have opinions, but I'm open to any considerations I may not have thought of. Overall my "values" for the design would be, in order:

Is easy for users to understand and use correctly
Allows for efficient implementations by vendors
Doesn't add too much to the standard

I'm also open to the idea of starting with a restricted subset of the above discussed functionality such that we can avoid having to settle on certain aspects of the design initially.

Looking forward to hearing more ideas.

As for the user-supplied function, if you really think you need it I'd suggest it be ELEMENTAL rather than PURE. ELEMENTAL is, of course, a much more restricted version of pure. That would considerably simplify the documentation of the argument.

That's an interesting idea (and tempting) as the constraints for elemental functions are nearly the exact constraints needed. However, I'm not sure it quite works because:

A dummy procedure or procedure pointer shall not be specified to be ELEMENTAL.

An elemental subprogram is a pure subprogram unless it has the prefix-spec IMPURE.

It's also not really called in an elemental way to perform the scan, so is somewhat disingenuous to say it needs to be possible.

I wrote a library with a subset of the HPF prefix/suffix (and also the grade/sort) about 20 years ago for personal experimentation. Also wrote some of the variants described in Blellochs book. I later had access to the PGI pghpf compiler. Also the NAG compiler had HPF library routines in it. I had some unit test code which I ran through all three. Found discrepancies between all of them... Unfortunately both PGI and NAG later removed support, so I never got to the bottom of who was right and who was wrong.

I spent some time this past weekend rewriting a new version from scratch. (For one thing, I wanted to use fypp to make things generic.)

To respond to a couple of your questions:

There are big differences between, say, SUM_PREFIX(ARRAY, MASK) and SUM_PREFIX(PACK(ARRAY, MASK)). The former is more like:

SUM_PREFIX (MERGE (ARRAY, 0, MASK))

For one thing, this maintains the shape of the output of the function to be the same as the input array.

Using a separate MASK argument can be much more efficient for machines with array elements distributed amongst a large number of distributed memories than by compressing and decompressing elements using PACK and UNPACK. And saves a temporary array compared to using MERGE.

For segmented scans, the subject of how to properly represent segments is interesting.

As you point out, HPF 2.0 (section 7.4.5, pdf page 105) uses the flip between .true. and .false. as the indicator of when to start a new segment. The Rationale discusses their reasoning for this.

In Blellochs book, he uses a .true. value to indicate where to start a new segment. He calls them 'head flags'. (See section 4.3, page 68) Implementing flips vs head flags isn't a big deal either way. The main thing is there is one logical associated with each array element.

But he also recognizes and discusses other ways of indicating segments. One such technique, reminiscent of how some libraries support sparse matrices, is to provide a second array of integers with the length of each segment. An advantage of this latter technique is that it allows empty segments. His example has four segments, the second of which is empty:

A = [s00, s01, s02, s20, s21, s30, s31, s32, s33]

A lengths vector for the above would be:

L = [3, 0, 2, 4]

You can't represent this with either the HPF flip method, nor the head flag method. Nonetheless, like MASK, on a machine with highly distributed data on highly distributed memories, using a logical per data array element may be the better parallel technique in practice.

Perhaps contacting Guy Blelloch, and/or Guy Steele (who was on the HPF committee and also a former colleague of Blellochs), might be justified to see if they have updated their thinking on this.

Walter

-----Original Message----- From: j3-fortran/fortran_proposals @.> Sent: Jan 13, 2023 2:26 PM To: j3-fortran/fortran_proposals @.> Cc: Walter Spector @.>, Mention @.> Subject: Re: [j3-fortran/fortran_proposals] New Scan Intrinsics (PR #290)

First, thank you all for your comments and suggestions. They have revealed some aspects that I had not initially considered. I would like to try and address some of the various comments. The most common suggestion has amounted to something like, "Why not just do exactly what HPF did?" Looking at what HPF did has been very valuable as a reference. I can certainly understand vendors wanting to be able to reuse existing work, but I want to still explore whether HPF necessarily did it perfectly, or whether we could potentially do better now. If the way we choose to do it now is slightly different than HPF, it doesn't mean that prior work is not reusable, just that it may require some modification. HPF provided the combinatorial set of operations with {PRE,SUF}FIX specific procedures, with optional argument to determine inclusive or exclusive. With the recent sentiments expressed that all new features should have compelling use cases, there is much more burden to justifying each one individually. By providing a generic procedure that is applicable for all operations, only a few use cases are needed to justify it, including use cases not covered by the operation specific versions. It has been suggested that the operation specific versions mean that the code can be type checked, but the compiler absolutely could type check the generic version. It is required the ARRAY and OPERATION have the same types, which the compiler can see at the call site and enforce. This is just like the generic REDUCE function. I will note that this does mean COUNT_{PRE,SUF}FIX is not possible with the generic version. A slight change to the description and arguments could re-enable it though. I.e. PURE FUNCTION OPERATION(x, y) RESULT(res) TYPE(), INTENT(in) :: x TYPE(), INTENT(in) :: y TYPE() :: res END FUNCTION and make IDENTITY a required argument. HPF provided {PRE/SUF}FIX functions, with an optional argument to do inclusive vs exclusive. It would be reasonable to do the opposite arrangement, have {IN/EX}CLUSIVE functions with an optional argument to do forward vs backward iteration. Or even just single functions with optional arguments for both, or individual functions for each. Each option has certain advantages and disadvantages that should be considered. HPF defined that the result of a {PRE/SUF}FIX function has the same shape as the array argument, regardless of the presence of a MASK argument, and that elements of the result for which no elements of the input contribute there is a "default" value with which that element is defined. This works for the HPF functions because all the specified operations either have a meaningful "default" value, or in the case of COPY, don't allow a mask or exclusive argument (i.e. no chance of zero elements contributing). This is not necessarily possible in the generic case. The generic REDUCE function overcomes this with an IDENTITY argument. My initial thought though was that the behavior would be more like SUM_PREFIX(ARRAY, MASK) == SUM_PREFIX(PACK(ARRAY, MASK)). I'm not sure I know enough about various use cases to decide which is more appropriate. I'm open to suggestions here. I'll just note that I believe most other intrinsics with a MASK argument do follow this pattern (or a similar pattern with loops over dimensions other than DIM). For example res => MINLOC(ARRAY, MASK) res(s1, ..., sdim-1, :, sdim+1, ..., sn) == MINLOC(PACK(ARRAY(s1, ..., sdim-1, :, sdim+1, ..., sn), MASK(s1, ..., sdim-1, :, sdim+1, ..., sn))) HPF treated the SEGMENT argument as adjacent elements with .EQV. corresponding elements in MASK being members of the same segment. I.e. SUM_PREFIX([1, 2, 3, 4, 5], MASK=[.true., .true., .false., .true., .true.]) == [1, 3, 3, 4, 9]. However, many implementations in other languages treat .true. values in the SEGMENT argument as signifying the first element in a SEGMENT. I.e. SUM_PREFIX([1, 2, 3, 4, 5], MASK=[.true., .false., .true., .true., .false.]) == [1, 3, 3, 4, 9]. Is it better to be consistent with HPF or with other languages? HPF did not provide collective subroutine versions for any of these operations. Depending on the chose design, what operations should we provide collective subroutines for? For all of these various dimensions for the possible design of this feature I of course have opinions, but I'm open to any considerations I may not have thought of. Overall my "values" for the design would be, in order:

Is easy for users to understand and use correctly
Allows for efficient implementations by vendors
Doesn't add too much to the standard

I'm also open to the idea of starting with a restricted subset of the above discussed functionality such that we can avoid having to settle on certain aspects of the design initially. Looking forward to hearing more ideas. — Reply to this email directly, view it on GitHub (https://github.com/j3-fortran/fortran_proposals/pull/290#issuecomment-1382461508), or unsubscribe (https://github.com/notifications/unsubscribe-auth/ABCNQO3PX36Q7XMOITGOU6TWSHJBNANCNFSM6AAAAAATXJ7GAQ). You are receiving this because you were mentioned.Message ID: @.***>

I can only comment on the name at this point. I don't like overloading the existing SCAN name as the purpose is entirely different, and it complicates how one categorizes the procedure. Yes, it can be distinguished by a compiler, but it will be very confusing for programmers as well as documentation. I see there are alternatives offered and would prefer any of them over just SCAN.

@sblionel , I totally understand. I was somewhat aware of those points and agree with them. I didn't have many ideas for a different name that seemed to fit well, and wanted to see how much and how strong the opposition was. If that's the biggest problem I'll be happy.

j3-fortran / fortran_proposals

New Scan Intrinsics #290