j3-fortran / fortran_proposals

Proposals for the Fortran Standard Committee
173 stars 14 forks source link

Unsigned integers #2

Open certik opened 4 years ago

certik commented 4 years ago

All integers in Fortran are signed. It is a common request to include unsigned integers. At the very least to help with the interoperation with the C API that uses unsigned integers.

The best approach currently is to use signed integers of the same size, and then convert them to unsigned Fortran integers of a bigger size appropriately.

nncarlson commented 1 year ago

can you show some use cases (say in C or C++) for the unsigned integer with arithmetic?

Hashing -- e.g., md5 or sha, does arithmetic with uint32_t.

tkoenig1 commented 1 year ago

Here's a routine which multiplies two unsigned integers which are stored in arrays of uint64_t in C:

#include <stdint.h>

void Long_multiplication( uint64_t multiplicand[],
                          uint64_t multiplier[],
                          uint64_t sum[],
                          int64_t ilength, int64_t jlength )
{
  uint64_t acarry, mcarry, product;

  for( int64_t i = 0;
       i < (ilength + jlength);
       i++ )
    sum[i] = 0;

  acarry = 0;
  for( int64_t j = 0; j < jlength; j++ )
    {
      mcarry = 0;
      for( int64_t i = 0; i < ilength; i++ )
        {
          __uint128_t mcarry_prod;
          __uint128_t acarry_sum;
          mcarry_prod = ((__uint128_t) multiplicand[i]) * ((__uint128_t) multiplier[j])
            + (__uint128_t) mcarry;
          mcarry = mcarry_prod >> 64;
          product = mcarry_prod;
          acarry_sum = ((__uint128_t) sum[i+j]) + ((__uint128_t) acarry) + product;
          sum[i+j] += acarry_sum;
          acarry = acarry_sum >> 64;
        }
    }
}
certik commented 1 year ago

@nncarlson for sha it seems it is just bit manipulation: https://github.com/brainhub/SHA3IUF/blob/fc8504750a5c2174a1874094dd05e6a0d8797753/sha3.c#L76, not +,- operations.

tkoenig1 commented 1 year ago

... and here is an approximate translation to Fortran with the proposal:

  subroutine long_multiplication (multiplicand, multiplier, rsum)
    use iso_fortran_env
    unsigned(uint64), dimension(0:), intent(in) :: multiplicand, multiplier
    unsigned(uint64), dimension(0:), intent(out) :: rsum
    unsigned(uint64) :: acarry, mcarry, prod
    unsigned(uint128) :: mcarry_prod, acarry_sum
    integer(int64) :: i,j
    rsum = 0
    acarry = 0
    do j=0, size(multiplier)
       mcarry = 0
       do i=0, size(multiplicand)
          mcarry_prod = unsigned(multiplicand(i),kind=uint128) * unsigned(multiplier(j),kind=uint128) &
               + mcarry
          mcarry = mcarry_prod / 2_uint128**64  ! Or shift right by 64
          prod = mcarry_prod ! Assignment mod 2**64
          acarry_sum = unsigned(rsum(i+j),kind=uint128) + unsigned(acarry,kind=uint128) + prod
          rsum(i+j) = rsum(i+j) + acarry_sum
          acarry = acarry / 2_uint128**64
       end do
    end do
  end subroutine long_multiplication
FortranFan commented 1 year ago

@tkoenig1 wrote Jan 20, 2023 2:46 PM EST:

and here is an approximate translation to Fortran with the proposal:

  ..
    integer(uint64) :: i,j
integer(int64) :: i, j

right?

nncarlson commented 1 year ago

@nncarlson for sha it seems it is just bit manipulation: https://github.com/brainhub/SHA3IUF/blob/fc8504750a5c2174a1874094dd05e6a0d8797753/sha3.c#L76, not +,- operations.

I'm looking at the C code from the standard linux libraries for md5/sha and there is a ton of bit manipulations and shifts, but also some additions.

tkoenig1 commented 1 year ago

@tkoenig1 wrote Jan 20, 2023 2:46 PM EST:

and here is an approximate translation to Fortran with the proposal:

  ..
    integer(uint64) :: i,j
integer(int64) :: i, j

right?

Yes, that was due to a search/replace error (now corrected).

tkoenig1 commented 1 year ago

@nncarlson for sha it seems it is just bit manipulation: https://github.com/brainhub/SHA3IUF/blob/fc8504750a5c2174a1874094dd05e6a0d8797753/sha3.c#L76, not +,- operations.

I'm looking at the C code from the standard linux libraries for md5/sha and there is a ton of bit manipulations and shifts, but also some additions.

A look at RFC 1321, the spec for MD5, shows

   Let the symbol "+" denote addition of words (i.e., modulo-2^32
   addition).

and later

     /* Round 1. */
     /* Let [abcd k s i] denote the operation
          a = b + ((a + F(b,c,d) + X[k] + T[i]) <<< s). */
     /* Do the following 16 operations. */
     [ABCD  0  7  1]  [DABC  1 12  2]  [CDAB  2 17  3]  [BCDA  3 22  4]
     [ABCD  4  7  5]  [DABC  5 12  6]  [CDAB  6 17  7]  [BCDA  7 22  8]
     [ABCD  8  7  9]  [DABC  9 12 10]  [CDAB 10 17 11]  [BCDA 11 22 12]
     [ABCD 12  7 13]  [DABC 13 12 14]  [CDAB 14 17 15]  [BCDA 15 22 16]

And elliptic curves also use long integers. So, we can conclude that cryptography in general does require unsigned arithmetic operatoins (although there may be some ciphers which do not use them).

certik commented 1 year ago

Thanks @tkoenig1, indeed it looks like the unsigned integers are a perfect fit for cryptography. Then again, how many lines will be using this unsigned +? We can have some intrinsic function to do this, we can also consider an operator like .uadd.. Here is a similar code that @nncarlson wrote for md5, already in Fortran: https://github.com/nncarlson/petaca/blob/70bae356c8a5d66980c6513f63d005540d92a5e4/src/secure_hash/md5_hash_type.F90#L201, which could use this unsigned integer. It think it would help. But it seems it would look roughly the same even if we do not overload +.

To move forward: @tkoenig1 why don't you focus in the proposal on the other stuff that I think we have rough agreement on, and for now keep the overloading of + and < as an "optional" addition, and we'll decide later if we should do it and how.

tkoenig1 commented 1 year ago

Sorry, I think using the normal operators for unsigned arithmetic is a must for "FORmula TRANslation". Otherwise, people who are used to it in other programming languages will just be confused.

As for comparisons, I have already dropped it - it is different from what people are used to in other languages.

certik commented 1 year ago

We definitely want things to look like math ("FORmula TRANslation"). At the same time, we want to ensure it is clear what is happening, to avoid bugs/pitfalls. So it's a fine line to walk. We'll figure it out. We made quite a nice progress already.

tkoenig1 commented 1 year ago

proposal-v3.md Here's a version which makes reference to BITS.

gronki commented 1 year ago

Again, I support tkoenig here. I do not think needlessly complicating something which is pretty obvious in competing languages (C/C++) would make a lot of sense. Another argument is "how many lines would use this new arithmetic and is it worth introducing" -- I see this as a bit of a chicken and egg question. How do we know that making Fortran slightly more friendly to problems requiring unsigned int not increase its use in fields other than strictly numerical computing? Right now, even with the best of will, it is just too much burden, and it is easier to write such tasks and C and use the binding interface -- but the only thing which makes it possible to compensate Fortran shortcomings is to use C, how do you get new people interested in starting projects in Fortran?

Dominik

sob., 21 sty 2023 o 11:40 tkoenig1 @.***> napisał(a):

proposal-v3.md https://github.com/j3-fortran/fortran_proposals/files/10471988/proposal-v3.md Here's a version which makes reference to BITS.

— Reply to this email directly, view it on GitHub https://github.com/j3-fortran/fortran_proposals/issues/2#issuecomment-1399225874, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4NA3JWOXCSMSPV4ZHF3MDWTO4KPANCNFSM4JBFTXZA . You are receiving this because you were mentioned.Message ID: @.***>

certik commented 1 year ago

@tkoenig1 how should be overflow defined for unsigned integers? In C it is defined to wraparound. However in Rust they are checked at runtime in Debug mode. What should we do for Fortran?

If we specify it as undefined (thus allowing compilers to check this in Debug mode), then it might avoid all the common pitfalls, even in comparisons, mixed modes, etc. However, it will prevent their usage for cases where you require a wraparound. But perhaps for those cases we can have some special intrinsics, like "add_wraparound(255, 1)".

Here is how to test this in Rust:

let x: u8 = "255".parse().unwrap();    
let val: u8 = x + 1;    
println!("{}", val);    
println!("Hello, world!");    

This trick forces Rust to evaluate the 255 at runtime (otherwise it would not even compile this).

In Release mode:

$ cargo run -r
   Compiling xx v0.1.0 (/private/tmp/xx)
    Finished release [optimized] target(s) in 0.11s
     Running `target/release/xx`
0
Hello, world!

In Debug mode:

$ cargo run     
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/xx`
thread 'main' panicked at 'attempt to add with overflow', src/main.rs:3:19
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
tkoenig1 commented 1 year ago

how should be overflow defined for unsigned integers? In C it is defined to wraparound. However in Rust they are checked at runtime in Debug mode. What should we do for Fortran?

The use cases include hashes and cryptography. In order to be useful for those use cases, the mod 2^n overflow behavior of C is needed, and this is what is done naturally on CPUs.

Requiring runtime checks by default is not something that Fortran does; the standard has been carefully crafted so that all constraints can be checked at compile-time. So, most compilers will then not perform the check by default.

The user will then see

That behavior is, I believe, confusing and will encourage people to leave off checks.

Compare the current behavior of integer overflow, which almost nobody checks and which can lead to real bugs (see the "porting to" announcement of the upcoming gcc13 release).

8bitmachine commented 1 year ago

Most if not all machine codes provides an overflow bit in a status register. My request is that Fortran is defined with a run-time check which amounts to a separate integer which might be called "overflow" that can be checked by the programmer. In much the same way that reading files can have an error check on end-of-file for example. The arithmetic operation on the unsigned integer will be effectively wrap-around and having the ability to detect the overflow will provide maximum flexibility for the programmer.

certik commented 1 year ago

Requiring runtime checks by default is not something that Fortran does; the standard has been carefully crafted so that all constraints can be checked at compile-time.

I don't think that's the case, consider for example these intrinsically runtime checks that Fortran compilers have to do in Debug mode:

There is probably more. So unsigned integers can be another addition to this long list of Debug time runtime checks.

Regarding:

That behavior is, I believe, confusing and will encourage people to leave off checks.

We can just look at Rust, which does exactly what I propose and there is no problem (it seems).

tkoenig1 commented 1 year ago

Most if not all machine codes provides an overflow bit in a status register.

There are quite a few exceptions, among them MIPS, Alpha and RISC-V.

My request is that Fortran is defined with a run-time check which amounts to a separate integer which might be called "overflow" that can be checked by the programmer.

For unsigned integers, this is actually rather straightforward.

For addition:

  c = a + b
  if (c < a) then ! overflow occured

For multiplication, all of the processors relevant today include a way to get the high word of a multiplication, and compilers exploit that. From Fortran, this would look like

  uinteger (kind=uint4) :: a, b
  uinteger(kind=uint8) :: c

  c = uint(a,uint8) * uint(b,uint8)
 if (c > huge(a)) then ! or a similar check if overflow occurred

This is highly efficient on modern architectures. For example, the C code

unsigned int
overfl (unsigned int a, unsigned int b, _Bool *overfl)
{
  unsigned long int c = (unsigned long int)a * (unsigned long int)b;
  *overfl = (c >> 32) != 0;
  return c;
}

is translated to (identical code with modern gcc and clang)

        movq    %rdx, %rcx
        movl    %edi, %eax
        mull    %esi
        seto    (%rcx)
        retq

In much the same way that reading files can have an error check on end-of-file for example. The arithmetic operation on the unsigned integer will be effectively wrap-around and having the ability to detect the overflow will provide maximum flexibility for the programmer.

I'ts already there. This is one of the nice thing about the defined semantics of wraparound.

tkoenig1 commented 1 year ago

Requiring runtime checks by default is not something that Fortran does; the standard has been carefully crafted so that all constraints can be checked at compile-time.

I don't think that's the case, consider for example these intrinsically runtime checks that Fortran compilers have to do in Debug mode:

  • array bounds checking
  • signed integers overflow
  • floating point NaN trip
  • dangling pointers
  • using unallocated arrays
  • ...

There is probably more. So unsigned integers can be another addition to this long list of Debug time runtime checks.

All of the checks on your list above has a valid reason.

Introducing a prohibition on mod 2^n arithmetic for the sole purpose of introducing such a prohibition serves no useful purpose. People will just continue using C.

Regarding:

That behavior is, I believe, confusing and will encourage people to leave off checks.

We can just look at Rust, which does exactly what I propose and there is no problem (it seems).

Fortran is not Rust, and we should not try to turn it into a clone of Rust.

certik commented 1 year ago

Fortran is not Rust, and we should not try to turn it into a clone of Rust.

I agree. We should also not turn Fortran into a clone of C. :)

Fortran is Fortran, so our job is to figure out what is best for Fortran.

There are 4 distinct concepts here:

And various conversion/casting between these.

@tkoenig1 it seems you want ModularInteger. You also mentioned BitVector (the "BITS" proposal). We already have SignedIntegers.

There are use cases for both ModularInteger and UnsignedInteger. The ModularInteger has use cases in hashes and cryptography, as well as random number generation; in my career I only needed ModularInteger a few times, but it is very rare. The UnsignedInteger use case is actually quite common, typically I needed it when I required only positive numbers and the full range. Say image processing with color channels 0-255. Many others cases like this, as well as interfacing with C, where I want 0-255 to map naturally. I do NOT want to wrap around for image processing, I want that to be an error. Finally, the BitVector is needed everytime you do any kind of bit manipulation, like "and", "or", "xor", "shift", etc.

We do not need to introduce all these concepts into the language, it might be an overkill; we can only do a subset of these, "merge" them in some ways, or handle them with intrinsic functions. But I do think these are four completely different concepts, and so at least for our discussion it helps to distinguish those.

With this in mind, let's analyze this issue:

Introducing a prohibition on mod 2^n arithmetic for the sole purpose of introducing such a prohibition serves no useful purpose. People will just continue using C.

When you say "mod 2^n" arithmetic, then you are talking about ModularInteger. Indeed, ModularInteger wraps around, by the definition. So I agree, we should not introduce a prohibition on a wraparound. We definitely want wrap around for ModularInteger.

However, I was talking about UnsignedInteger with:

There is probably more. So unsigned integers can be another addition to this long list of Debug time runtime checks.

And for UnsignedInteger you do not wrap around, also by definition.

We can also talk about what the hardware does. The unsigned integer addition (and mulitplication) indeed wraps around, so it corresponds to the ModularInteger abstraction, which is a faithful, solid and useful mathematical representation. The signed integer in modern hardware actually also wraps around, but even in C this is undefined behavior (I believe this might be due to historical reasons, as in the past there were several hardware representations of signed integers, but things have all converged to two's complement, which is probably the best way to handle signed integers). Mathematically this is not useful, as far as I know, so I have not introduced SignedIntegerWithWraparound as a concept above. If there is a use case for this, then we can introduce it too.

Unlike C, Fortran is not meant as "high level assembly language", and it has never followed the hardware exactly. Fortran has mathematical concepts (like signed integers) and of course it is designed in a way to generate high performing code. So we do not want to prevent that. But at the same time, just because C (or Rust) does something does not mean Fortran has to do the same. We should design this feature based on the use cases.

Which case is more common: ModularInteger or UnsignedInteger?

As I said, I have no doubts it is UnsignedInteger by far. But if you disagree, then let's discuss it.

Once we agree which use case is more common, then we can perhaps add that type, and then add intrinsics to allow operation on the other type.

tkoenig1 commented 1 year ago

Fortran is not Rust, and we should not try to turn it into a clone of Rust.

I agree. We should also not turn Fortran into a clone of C. :)

Fortran is Fortran, so our job is to figure out what is best for Fortran.

There are 4 distinct concepts here:

  • SignedInteger (does not wrap around, gives a Debug time overflow error; this is current behavior of C and Fortran signed integers, where the overflow is not defined, and consequently compilers can and do check overflow in Debug mode)

  • UnsignedInteger (does not wrap around, gives a Debug time error for overflow, like in Rust; similar to signed integer, but "shifted", so instead of -128..127 for i8, it is 0..255 for u8)

  • ModularInteger (behaves like unsigned in C, integer modulo 2^n for n=8, 16, 32, 64, wraps around)

  • BitVector (allows to do bit manipulation; typically length 8, 16, 32, 64 bits)

And various conversion/casting between these.

OK so far.

@tkoenig1 it seems you want ModularInteger. You also mentioned BitVector (the "BITS" proposal). We already have SignedIntegers.

I want ModularInteger, correct.

There are use cases for both ModularInteger and UnsignedInteger. The ModularInteger has use cases in hashes and cryptography, as well as random number generation; in my career I only needed ModularInteger a few times, but it is very rare. The UnsignedInteger use case is actually quite common, typically I needed it when I required only positive numbers and the full range. Say image processing with color channels 0-255. Many others cases like this, as well as interfacing with C, where I want 0-255 to map naturally. I do NOT want to wrap around for image processing, I want that to be an error. Finally, the BitVector is needed everytime you do any kind of bit manipulation, like "and", "or", "xor", "shift", etc.

I argue that an unsigned type with mod 2^n arithmetic (your ModularInteger) does a better job.

It is useful for a wider varitety of tasks (hashes and cryptography), what happens on overflow is defined and (at least for addition) checks are quite easy.

We do not need to introduce all these concepts into the language, it might be an overkill; we can only do a subset of these, "merge" them in some ways, or handle them with intrinsic functions. But I do think these are four completely different concepts, and so at least for our discussion it helps to distinguish those.

I think that ModuloInteger can do the job of all four sufficiently well.

With this in mind, let's analyze this issue:

Introducing a prohibition on mod 2^n arithmetic for the sole purpose of introducing such a prohibition serves no useful purpose. People will just continue using C.

When you say "mod 2^n" arithmetic, then you are talking about ModularInteger. Indeed, ModularInteger wraps around, by the definition. So I agree, we should not introduce a prohibition on a wraparound. We definitely want wrap around for ModularInteger.

However, I was talking about UnsignedInteger with:

There is probably more. So unsigned integers can be another addition to this long list of Debug time runtime checks.

And for UnsignedInteger you do not wrap around, also by definition.

There is no need for UnsignedInteger if you have ModularInteger.

We can also talk about what the hardware does. The unsigned integer addition (and mulitplication) indeed wraps around, so it corresponds to the ModularInteger abstraction, which is a faithful, solid and useful mathematical representation. The signed integer in modern hardware actually also wraps around, but even in C this is undefined behavior (I believe this might be due to historical reasons, as in the past there were several hardware representations of signed integers, but things have all converged to two's complement, which is probably the best way to handle signed integers). Mathematically this is not useful, as far as I know, so I have not introduced SignedIntegerWithWraparound as a concept above. If there is a use case for this, then we can introduce it too.

I don't think there is a use case for that :-)

Unlike C, Fortran is not meant as "high level assembly language", and it has never followed the hardware exactly. Fortran has mathematical concepts (like signed integers) and of course it is designed in a way to generate high performing code. So we do not want to prevent that. But at the same time, just because C (or Rust) does something does not mean Fortran has to do the same. We should design this feature based on the use cases.

Which case is more common: ModularInteger or UnsignedInteger?

ModularInteger can to the jobs that UnsignedInteger can do, but not vice versa.

Regarding use cases: Fortran is not C, and should not try to be it. But Fortran is by far the more powerful language, and (excluding the preprocessor) C unsigned integers are about the only feature where Fortran does not offer equal or better features, compared to C.

certik commented 1 year ago

@tkoenig1 thanks, I think we made good progress on understanding each other. Almost everything is clear to me now, except one detail:

It is useful for a wider varitety of tasks (hashes and cryptography), what happens on overflow is defined and (at least for addition) checks are quite easy.

When you say "checks are quite easy", are you talking about optional overflow checks for ModularInteger? That is, by default it wraps around, but with some optional compiler option, one could prevent it to wrap around and instead give a runtime error?

tkoenig1 commented 1 year ago

@certik ,

When you say "checks are quite easy", are you talking about optional overflow checks for ModularInteger? That is, by default it wraps around, but with some optional compiler option, one could prevent it to wrap around and instead give a runtime error?

TL;DR:

ModularInteger makes user checks easier.

Long version:

By this I mean that checks are easy for the user to insert, because wraparound is defined.

To elaborate a bit on my comment above:

If overflow on unsigned addition were prohibited with a "shall" directive, like signed overflow is now, the code for checking on overflow for addition would be difficult.

With defined wraparound semantics, the solution is straightforward. If you'll excuse my C, the function

unsigned long
add_ul_overflow (unsigned long a, unsigned long b, _Bool *flag)
{
  unsigned long c = a + b;
  *flag = c < a;
  return c;
}

performs an unsigned long addition and sets *flag if overflow occurs. This is actually translated into very efficient assembler code on AMD64:

        movq    %rdi, %rax
        addq    %rsi, %rax
        setb    (%rdx)
        retq

If overflow on unsigned addition were in violation of a "shall" directive, overflow would be illegal. This straightworward test would still work at low optimization levels and without checking. With checking enabled, the addition would fail. At high optimization levels, I would compiler writers to eventually realize that the condition c < a can never be true, and store a zero into *flag unconditionally.

Writing a standard-conforming test for an illegal operation like this is much more difficult (as is writing Fortran code to see if a+b will overflow for signed integers).

certik commented 1 year ago

Got it, thanks @tkoenig1, I think it's clear now what your proposal is. Yes, that is clean, I think ModularInteger should not be checked by the compiler and it should wrap around (that's the definition of it). The Fortran standard can even provide those functions like add_ul_overflow.

The competing proposal that I offer is to do UnsignedIntegers, which are checked by the compiler in Debug mode. We can offer functions that implement wraparound semantics for the modular integer use cases (cryptography, random numbers, etc.).

We started designing both ModularInteger (https://github.com/lcompilers/lpython/issues/1578) and UnsignedInteger (https://github.com/lcompilers/lpython/issues/1588) in LFortran. We have not decided if we need both, or just one internally in the compiler, but both can be done cleanly. At the surface level, what users in Fortran will see, we can thus support both designs relatively easily I think. I assume it would not be that difficult to support either one in GFortran as well.

So we should choose such a design that makes the most sense for Fortran. It seems that with either design you'll be able to do both wraparounds or errors. It's just which behavior will be the default (and the other behavior you'll get via function calls).

tkoenig1 commented 1 year ago

@certik , there is another point.

The standard does not mandate checks (apart from the constraints), and ModularInteger behaves the same as UnsignedInteger as long as there is no overflow. A processor is therefore free to implement UnsignedInteger as ModularInteger. It is also free to document that behavior as an extension to the standard.

Given the additional use cases that ModularInteger has over UnsignedInteger, and that this is actually the easiest way to implement UnsignedInteger, I think that is rather likely to occur.

klausler commented 1 year ago

The standard does not mandate checks (apart from the constraints), and ModularInteger behaves the same as UnsignedInteger as long as there is no overflow. A processor is therefore free to implement UnsignedInteger as ModularInteger. It is also free to document that behavior as an extension to the standard.

Given the additional use cases that ModularInteger has over UnsignedInteger, and that this is actually the easiest way to implement UnsignedInteger, I think that is rather likely to occur.

Behavior that is not specified is more likely to pose a risk to code portability between compilers. If you're proposing a feature for a standard, avoid implementation-defined behavior unless there are already existing incompatible implementations that can't be changed. New features should avoid implementation-defined behavior entirely.

certik commented 1 year ago

For ModularInteger I think there is no issue standardizing.

For UnsignedInteger, can the standard prescribe Debug runtime checks? The closest feature would be runtime array bounds checking, which all compilers effectively do. I only found this in the standard:

The value of a subscript in an array element shall be within the bounds for its dimension.

So for UnsignedInteger we could say something like "integer operations shall be within the range of the given unsigned integer".

The closest I was able to find is in "10.1.5.2.4 Evaluation of numeric intrinsic operations":

The execution of any numeric operation whose result is not defined by the arithmetic used by the processor is prohibited. Raising a negative real value to a real power is prohibited.

My understanding of this paragraph is that if you add two signed integers and it doesn't fit into the given type (="result is not defined by the arithmetic used by the processor"), then it is prohibited.

gronki commented 1 year ago

I also feel uncomfortable hearing about "compiler extensions". I'd personally rather have one behavior independent on the compiler and debug flags.

And once again thank you all for effort to make Fortran better.

Dominik

pt., 31 mar 2023, 15:30 użytkownik Peter Klausler @.***> napisał:

The standard does not mandate checks (apart from the constraints), and ModularInteger behaves the same as UnsignedInteger as long as there is no overflow. A processor is therefore free to implement UnsignedInteger as ModularInteger. It is also free to document that behavior as an extension to the standard.

Given the additional use cases that ModularInteger has over UnsignedInteger, and that this is actually the easiest way to implement UnsignedInteger, I think that is rather likely to occur.

Behavior that is not specified is more likely to pose a risk to code portability between compilers. If you're proposing a feature for a standard, avoid implementation-defined behavior unless there are already existing incompatible implementations that can't be changed. New features should avoid implementation-defined behavior entirely.

— Reply to this email directly, view it on GitHub https://github.com/j3-fortran/fortran_proposals/issues/2#issuecomment-1491929009, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4NA3MBBJGDYTEVASK5OADW63L57ANCNFSM4JBFTXZA . You are receiving this because you were mentioned.Message ID: @.***>

certik commented 1 year ago

I also feel uncomfortable hearing about "compiler extensions". I'd personally rather have one behavior independent on the compiler and debug flags.

The behavior of any Fortran code depends on the compiler and debug flags, for example any time you index into an array like A(i), you will get different behavior for out of bound access based on your compiler and flags. We could "fix" that by making array indexing to wraparound ("ModularArrayIndex"), then A(i) would always be defined, "have one behavior independent on the compiler and debug flags". I think it would be a bad idea to do, it would hide so many bugs.

So I want to be very clear about this point. If Fortran didn't have arrays and we were figuring out how to introduce array indexing into the language, we should not use this argument to require the index to wraparound. In the same way we should not use this argument to rule out the UnsignedInteger proposal (as defined above).

@klausler has a good point about avoiding "implementation-defined behavior". However I feel this requirement might rule out many good features, including array indexing. I know that array indexing is already in the language, but I feel our "standardization process" for new features should not rule it out. Also, as I noted above, the signed integers in Fortran seem to have precisely this "implementation-defined behavior" for out of range operations.

gronki commented 1 year ago

I think it is hard to compare array indexing to integer overflow. In almost all existing systems, array overrun means a serious threat and it makes sense to treat is as such. I feel like integer overflow is much of a less threat, although I agree that whether it is a threat or not depends on the context. I would still prefer to minimize the compiler dependent behavior and make sure that people are not able to utilize compiler dependent quirks. 🙂

Speaking of which, will you be able to index arrays using unsigned integers? My guess is no and I'm fine with it, but I'm curious on your position.

Either way, getting unsigned integers would make me super happy.

Dominik

pt., 31 mar 2023, 16:42 użytkownik Ondřej Čertík @.***> napisał:

I also feel uncomfortable hearing about "compiler extensions". I'd personally rather have one behavior independent on the compiler and debug flags.

The behavior of any Fortran code depends on the compiler and debug flags, for example any time you index into an array like A(i), you will get different behavior for out of bound access based on your compiler and flags. We could "fix" that by making array indexing to wraparound ("ModularArrayIndex"), then A(i) would always be defined, "have one behavior independent on the compiler and debug flags". I think it would be a bad idea to do, it would hide so many bugs.

So I want to be very clear about this point. If Fortran didn't have arrays and we were figuring out how to introduce array indexing into the language, we should not use this argument to require the index to wraparound. In the same way we should not use this argument to rule out the UnsignedInteger proposal (as defined above).

@klausler https://github.com/klausler has a good point about avoiding "implementation-defined behavior". However I feel this requirement might rule out many good features, including array indexing. I know that array indexing is already in the language, but I feel our "standardization process" for new features should not rule it out. Also, as I noted above, the signed integers in Fortran seem to have precisely this "implementation-defined behavior" for out of range operations.

— Reply to this email directly, view it on GitHub https://github.com/j3-fortran/fortran_proposals/issues/2#issuecomment-1492033848, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4NA3KO6BXQGTXHTY655UTW63UO3ANCNFSM4JBFTXZA . You are receiving this because you were mentioned.Message ID: @.***>

klausler commented 1 year ago

If your feature is not already available in real compilers, you're designing it, not standardizing it. Design it completely so that code using the feature will be portable across conforming implementations.

certik commented 1 year ago

Thanks everybody. Probably the next step is to have prototypes for this and once we have it and gain some experience using it, we can meet perhaps over video to discuss more.

tkoenig1 commented 1 year ago

I think it is hard to compare array indexing to integer overflow. In almost all existing systems, array overrun means a serious threat and it makes sense to treat is as such. I feel like integer overflow is much of a less threat, although I agree that whether it is a threat or not depends on the context. I would still prefer to minimize the compiler dependent behavior and make sure that people are not able to utilize compiler dependent quirks.

Which would (IMHO) be an argument for wrap-around semantics, these are defined.

Speaking of which, will you be able to index arrays using unsigned integers? My guess is no and I'm fine with it

Unsigned array subscripts would not work well together with array bounds, which can be negative, positive or zero. (They are more reasonable for C, where the lower bound is always zero). So, unsigned array subscripts would make little sense.

The nice thing for the people editing the standard is that these rules do not need to be touched - a subscript is a scalar-int-expr, and as long as an unsigned-expr (or whatever it ends up being called) is not an int-expr, everything is fine.

tkoenig1 commented 1 year ago

Behavior that is not specified is more likely to pose a risk to code portability between compilers. If you're proposing a feature for a standard, avoid implementation-defined behavior unless there are already existing incompatible implementations that can't be changed. New features should avoid implementation-defined behavior entirely.

I concur.

Also, when adding a new feature, avoid adding something that makes it tempting to add processor-defined behavior in the first place.

gronki commented 1 year ago

Personally I am more for modulo version since this is what I'm more familiar with from C. What would be the impact of both variants on the C interoperability?

Dominik

pt., 31 mar 2023, 21:28 użytkownik tkoenig1 @.***> napisał:

I think it is hard to compare array indexing to integer overflow. In almost all existing systems, array overrun means a serious threat and it makes sense to treat is as such. I feel like integer overflow is much of a less threat, although I agree that whether it is a threat or not depends on the context. I would still prefer to minimize the compiler dependent behavior and make sure that people are not able to utilize compiler dependent quirks.

Which would (IMHO) be an argument for wrap-around semantics, these are defined.

Speaking of which, will you be able to index arrays using unsigned integers? My guess is no and I'm fine with it

Unsigned array subscripts would not work well together with array bounds, which can be negative, positive or zero. (They are more reasonable for C, where the lower bound is always zero). So, unsigned array subscripts would make little sense.

The nice thing for the people editing the standard is that these rules do not need to be touched - a subscript is a scalar-int-expr, and as long as an unsigned-expr (or whatever it ends up being called) is not an int-expr, everything is fine.

— Reply to this email directly, view it on GitHub https://github.com/j3-fortran/fortran_proposals/issues/2#issuecomment-1492488217, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4NA3I4U5CM6IQNA47CZELW64V5PANCNFSM4JBFTXZA . You are receiving this because you were mentioned.Message ID: @.***>

tkoenig1 commented 1 year ago

Personally I am more for modulo version since this is what I'm more familiar with from C. What would be the impact of both variants on the C interoperability?

Hardly anything. It would then not be permitted to perform modulo 2^n arithmetic on the Fortan side, but things like passing an unsigned to C, performing modulo 2^n arithmetic on the C side and passing it back would be possible.

FortranFan commented 1 year ago

Probably the next step is to have prototypes for this

@tkoenig1 , @certik ,

It will be really cool if you can find resources (time and/or people) to help you trial this in gfortran and LFortran respectively. Given your discussion in this thread, it doesn't appear all that difficult for the two of you to design something implementable in the respective compilers. Note your expert experimental implementation toward such *integers (unsigned/modular) in actual compilers is about the only chance this can ever get into the language. Otherwise, you can take it as an absolute guarantee this will "die on the vine" with the J3/WG5 committees.

On the other hand, please know for sure there are many, many Fortran practitioners across the globe who would concur with @gronki's statement, "getting unsigned integers would make me super happy." However the J3/WG5 committees do little to nothing to follow up, this has been requested many, many times going back several decades including it being 4th on the list of desired Fortran 2023 features and yet the upcoming language standard does not support it.

Your efforts here are the only hope for Fortran.

tkoenig1 commented 1 year ago

@certik:

For UnsignedInteger, can the standard prescribe Debug runtime checks?

It would be the first prescribed run-time check in the standard, so the answer is (reasonably) no.

tkoenig1 commented 1 year ago

Here's an update for a proposal.

This is currently formatted for markdown, but is of course trivial to convert to text if required.

How to proceed further?

proposal-v4.md

certik commented 1 year ago

How to proceed further?

A compiler prototype and test that the proposal makes sense, that the rules are reasonable and that we avoid the common pitfalls known from C.

tkoenig1 commented 1 year ago

How to proceed further?

A compiler prototype and test that the proposal makes sense, that the rules are reasonable and that we avoid the common pitfalls known from C.

I think having a clear specification of what to implement should come first. It is quite possible to discuss possible scenarios of what users could do wrong, and what to avoid (and what not to avoid) based on a spec. It also helps to think what should be a constraint, and what should be a standard violation which is the user's fault.

FortranFan commented 1 year ago

I think having a clear specification of what to implement should come first.

That "come first" just ain't gonna happen.

If you are truly keen on serving the Fortran practitioners generally and the gfortran users specifically, start working on an experimental prototype that you and your colleague think based on your expertise and experience is best for the language.

Perhaps a similar effort will kick off with LFortran led by @certik.

Then collaborate along the way with @certik et al. and iterate until you converge on a good specification toward a "production" implementation and the official language.

tkoenig1 commented 1 year ago

I think having a clear specification of what to implement should come first.

That "come first" just ain't gonna happen.

A clear specification is not going to happen? Why on earth not? The proposal I attached to the comment above is quite some way already.

If you are truly keen on serving the Fortran practitioners generally and the gfortran users specifically, start working on an experimental prototype that you and your colleague think based on your expertise and experience is best for the language.

If you are truly keen on serving the Fortran practitioners, go ahead and implement it in gfortran yourself - it's open source, and people (not only myself) are always willing to help. This guide to gfortran hacking gives some pointers.

FortranFan commented 1 year ago

A clear specification is not going to happen? Why on earth not? The proposal I attached to the comment above is quite some way already.

Note I agree 100% with what you write re: "The proposal I attached to the comment above is quite some way already" and that's why there is not a technical reason for anything to "come first" before commencing work on a compiler prototype as advised by @certik to your inquiry, 'How to proceed further?"

So anything else "coming first", such as an even clearer specification than what you have, is what I meant by "ain't gonna happen"

Thus to reiterate: you have made enough progress on this proposal to move to the next phase which is indeed experimental prototype(s) in gfortran, LFortran, etc.

If you are truly keen on serving the Fortran practitioners, go ahead and implement it in gfortran yourself

I am more than willing to make some financial contribution to the compiler development effort toward this if done by (preferably undergrad) student(s) with some compiler knowledge in a good educational program somewhere and whose effort is guided by those with prior gfortran expertise such as yourself, a la Google Summer of Code.

tkoenig1 commented 1 year ago

@FortranFan : I do not think that a gfortran implementation is needed. If you think the paper I attached above is not convincing enough, suggest additions to the paper.

@certik : Do you, in fact, plan on doing an implementation? It is a rather big project, I think.

certik commented 1 year ago

@certik : Do you, in fact, plan on doing an implementation? It is a rather big project, I think.

Yes, I plan to start next week. I'll keep this thread updated once we have something to play with.

FortranFan commented 1 year ago

@certik : Do you, in fact, plan on doing an implementation? It is a rather big project, I think.

Yes, I plan to start next week. I'll keep this thread updated once we have something to play with.

Brilliant!

FortranFan commented 1 year ago

@FortranFan : I do not think that a gfortran implementation is needed. If you think the paper I attached above is not convincing enough, suggest additions to the paper.

@tkoenig1 ,

The paper looks solid. And with your discussion and convergence here on some crucial aspects with @certik you have taken the design a long way. If you're keen on unsigned integers making it into the Fortran language, please seriously consider an experimental implementation in gfortran. Working with your colleague and with the back-and-forth with @certik and LFortran, you will find it a worthy pursuit more so than anything onerous with this big task. The learnings from the 2 efforts will be the best way to get this into the language. Please note this will otherwise be yet another popular request by the practitioners of Fortran that dies on the vine.

tkoenig1 commented 7 months ago

The proposal has now been uploaded as 24-102.txt.

jeffhammond commented 6 months ago

for 2.2 Automatically generated C headers, i do not see any C standard library functions for which this is required that we actually need to wrap. can you elaborate?