Unsigned integers - Githubissues

certik commented 4 years ago

All integers in Fortran are signed. It is a common request to include unsigned integers. At the very least to help with the interoperation with the C API that uses unsigned integers.

The best approach currently is to use signed integers of the same size, and then convert them to unsigned Fortran integers of a bigger size appropriately.

zbeekman commented 4 years ago

Yes, for certain algorithms (hash functions) the wrap around nature of unsigned ints is convenient and it would lead to less confusion with C interop. I do not believe the standard requires integers to be implemented as two's complement, but I do not know of a compiler that uses a different convention. This means that the underlying machinery for unsigned integers is already in place. (And that programmers can "roll their own" but this can be confusing or less clear/explicit in the source code.

certik commented 4 years ago

After talking about this with a few members on the committee, it seems most are in agreement that having this in the C interop might be a good idea, but allowing this in the Fortran language itself would do more harm than good (vendors don't like it; it's easy to have all kinds of subtle bugs with unsigned integers such as comparing subtracting etc.).

Python does not have unsigned integers, although NumPy does, and so does Julia. It is true that it would be useful for hash functions. Numerical computational code does not seem to need them.

I can personally see very good arguments both for and against having this in the language itself. I am leaning towards against, as it keeps the language smaller and excludes many kinds of possible bugs and warnings.

We can start with the C interop, where it should be easier to get agreement, to see if there is anything that would make sense to propose.

gronki commented 4 years ago

I wanted to chime in and say that one important use case of unsigned integers is handling images. To store a monochrome 8-bit image, one either has to use twice-as-large 16 bit integer or store it as 8 bit unsigned int and deal with wrapping modulo 128 which makes any arithmetic operation impossible. This is true for any binary data, not only images. So I think the issue is not stricly C-interop related.

FortranFan commented 4 years ago

I wanted to chime in and say that one important use case of unsigned integers is handling images. To store a monochrome 8-bit image, one either has to use twice-as-large 16 bit integer or store it as 8 bit unsigned int and deal with wrapping modulo 128 which makes any arithmetic operation impossible. This is true for any binary data, not only images. So I think the issue is not stricly C-interop related.

I agree with this: unsigned integers can be of great help in any systems programming context, handling of binary data of any form (images or otherwise) can be a use case within this space. Though some will argue unsigned integers are not an absolute must for systems programming, the fact is this facility can really make coders' lives easier. If Fortran intends to be taken truly seriously as a general-purpose language, it should consider including unsigned integers; its type system is general and it does not in any way appear to interfere with its introduction.

Interestingly, unsigned integers feature was 4th on the top 6 list of desired features by users in the WG5 survey for Fortran 202X. Ignoring this any longer feels like suppression of the voice of the customers!

certik commented 4 years ago

I just want to react to this particular point:

If Fortran intends to be taken truly seriously as a general-purpose language

Fortran is not a general-purpose language. Rather, it is a domain specific language for array oriented scientific computing.

As a larger point, it touches what we want Fortran to be, see #59.

gronki commented 4 years ago

Fortran is not a general-purpose language. Rather, it is a domain specific language for array oriented scientific computing.

I agree with this and with that direction, however reading/writing binary data is often a part of it, and this is the place where lack of byte/uint really bit me as very often data is stored as uint16 FITS/TIFF files.

Dominik

pon., 28 paź 2019 o 22:31 Ondřej Čertík notifications@github.com napisał(a):

I just want to react to this particular point:

If Fortran intends to be taken truly seriously as a general-purpose language

Fortran is not a general-purpose language. Rather, it is a domain specific language for array oriented scientific computing.

As a larger point, it touches what we want Fortran to be, see #59 https://github.com/j3-fortran/fortran_proposals/issues/59.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/j3-fortran/fortran_proposals/issues/2?email_source=notifications&email_token=AC4NA3N4HGNT7GC7E7BKHMDQQ5K3XA5CNFSM4JBFTXZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECOOZEY#issuecomment-547155091, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4NA3LG6CZDOIS7GUO7UIDQQ5K3XANCNFSM4JBFTXZA .

certik commented 4 years ago

@gronki I think you are right about the use case of reading binary data files. Let's collect such use cases. I think we want to be able to write readers and writers for binary files in Fortran.

FortranFan commented 4 years ago

I just want to react to this particular point:

If Fortran intends to be taken truly seriously as a general-purpose language

Fortran is not a general-purpose language. Rather, it is a domain specific language for array oriented scientific computing. ..

That may be the current reality with Fortran, but almost everyone who have worked on its language design and continue to do so will greatly dislike being reduced as such and would very much want Fortran to be seen as a general-purpose language. It's a different matter whether the words are backed up by actions!

certik commented 4 years ago

Let's continue the discussion what Fortran should become here: https://github.com/j3-fortran/fortran_proposals/issues/59. I started in this comment: https://github.com/j3-fortran/fortran_proposals/issues/59#issuecomment-547507074.

klausler commented 4 years ago

I'm not sure that Fortran needs an unsigned type so much as it actually needs some unsigned operations and relations. Be wary of simply copying-and-pasting C's unsigned types into Fortran, for they are full of pitfalls that shouldn't be perpetuated, mostly around their interactions with signed types and conversions.

8bitmachine commented 2 years ago

I too would like unsigned integers. Mainly because of bitmap, image data, and other binary coding or data. I do not see that this should be a problem as it would be a new data type (not an operation, we do need a type) that existing programs would continue to default to signed integers. My work around for simplicity (not elegance) is to use 16 bit integers and save the data as is in 8 bits when needed. I find that more convenient than having to check 8 bits all the time if only unsigned is needed.

tkoenig1 commented 1 year ago

I would like to have unsigned integers as well, but there need to be clear definitions of how they interact with signed integers as well.

Assume

  integer :: i
  unsigned integer :: u

then the question of what type i+u should have is difficult, and needs to be resolved in a better way than C did. I would probably favor of banning arithmetic operations involving a signed and unsigned integer without explicit conversion.

Comparisons between signed and unsigned should take the sign into account, so that -2 is smaller than (unsigned) 1. For constants it is also not quite clear how to differentiate an unsigned from a signed constant. The _ suffix is already taken.

So, lots of decisions, and lots of traps and pitfalls. Avoiding everything that C got wrong does not mean that a proposal would get it right...

FortranFan commented 1 year ago

Imagine Fortran 202Y introduces a distinct new intrinsic type unsigned integers, say it is termed uinteger.

Now assume

it is only directly allowed in expressions, assignment, and comparisons with strict type compatibility requirements i.e., only with other uintegers,
A new intrinsic UINT intrinsic is introduced toward type conversion of integers to uintegers with well-defined requirements and semantics,
INT intrinsic is extended to support type conversion of uintegers to integers with well-defined requirements and semantics,

Then what are the pitfalls from other experiments (C-like languages) that can be envisioned with such a design in Fortran?

Note 1. above will mean

integer :: i, foo
uinteger :: u, bar
u = i !<-- Not allowed
i = u !<-- Not allowed
foo = i + u !<-- Not allowed
bar = i + u !<-- Not allowed
if ( i > u )  !<-- Not allowed
..
u = UINT( i )  !<-- Ok
foo = i + INT(u)  !<-- Ok
..

certik commented 1 year ago

Here is a small subset of possible pitfalls that I recommend to address:

Subtracting two unsigned integers can wrap around (e.g. 3 - 5 = -2 -> -2+UINT_MAX+1).
Consequently 3 - 5 < 3 is false, which can easily lead to many bugs in a code
The last condition in unsigned int x = 1; int y = -2; (x + y > 0) evaluated to true, even though with signed integers x+y=(1)+(-2)=-1.
Another variant: unsigned int a = 1000; signed int b = -1; (a > b) evaluates to false, even though with signed integers (1000 > -1 is true)
Infinite loop: for (unsigned z = 5; z >= 0; z--) { do_something(z); }
Complicated automatic casting in the frontend for things like comparisons of signed and unsigned integers
Hard to figure out for users when to use each type. For example it's safer to use signed integers, except when a function returns an unsigned integer (say the .size() function in C++), then one gets compiler warnings in loops like for (int64_t i=0; i < x.size(); i++) of comparing signed and unsigned integer, so one is forced to rewrite to for (uint64_t i=0; i < x.size(); i++).
Unsigned integers cannot be treated as a range-limited version of signed ones because their range of values is not a subset of the signed integers range. Neither signed, nor unsigned integers are subtypes of each other. For example -128 <= i8 <= 127 but 0 <= u8 <= 255, so x: i8 with the condition of x > 0 is not equal to x: u8, because for example x=200 is representable as a u8, but not i8.
Many developers (but not all) believe that unsigned integers should be avoided, such as the Google C++ guidelines:
- You should not use the unsigned integer types such as uint32_t, unless there is a valid reason such as representing a bit pattern rather than a number, or you need defined overflow modulo 2^N. In particular, do not use unsigned types to say a number will never be negative. Instead, use assertions for this.
- Unsigned integers are good for representing bitfields and modular arithmetic. Because of historical accident, the C++ standard also uses unsigned integers to represent the size of containers - many members of the standards body believe this to be a mistake, but it is effectively impossible to fix at this point. The fact that unsigned arithmetic doesn't model the behavior of a simple integer, but is instead defined by the standard to model modular arithmetic (wrapping around on overflow/underflow), means that a significant class of bugs cannot be diagnosed by the compiler. In other cases, the defined behavior impedes optimization.
- That said, mixing signedness of integer types is responsible for an equally large class of problems. The best advice we can provide: try to use iterators and containers rather than pointers and sizes, try not to mix signedness, and try to avoid unsigned types (except for representing bitfields or modular arithmetic). Do not use an unsigned type merely to assert that a variable is non-negative.

References:

8bitmachine commented 1 year ago

I'd prefer no wrap-around and an error if an unsigned integer overflowed /underflowed, but which could be checked and corrected with the same considerations as a carry bit so that increases/decreases could be handled. It would actually allow wrap-around with a correction function following by ignoring carry/borrow. Unsigned integers are needed on occasions as others have mentioned. I've needed them for A-D conversion, bit checking and image handling.

tkoenig1 commented 1 year ago

Here is a small subset of possible pitfalls that I recommend to address:

Subtracting two unsigned integers can wrap around (e.g. 3 - 5 = -2 -> -2+UINT_MAX+1).

Yes, this is the nature of modulo arithmetic.

Consequently 3 - 5 < 3 is false, which can easily lead to many bugs in a code

Again, this is the nature of modulo arithmetic. It would be the expectation that people who use it know what they are doing. Maybe this can be alleviated by chosing some more descriptive name which has the modulo in the name.

The last condition in unsigned int x = 1; int y = -2; (x + y > 0) evaluated to true, even though with signed integers x+y=(1)+(-2)=-1.

It makes little sense to think of unsigned integers as "-1". Again, this is implied in modulo 2^n arithmetic.

Another variant: unsigned int a = 1000; signed int b = -1; (a > b) evaluates to false, even though with signed integers (1000 > -1 is true)

Same thing.

Infinite loop: for (unsigned z = 5; z >= 0; z--) { do_something(z); }

Jep. I would actually not permit unsigned types for DO loops.

Complicated automatic casting in the frontend for things like comparisons of signed and unsigned integers

This, I would disallow - explicit type conversions only.

Hard to figure out for users when to use each type. For example it's safer to use signed integers, except when a function returns an unsigned integer (say the .size() function in C++), then one gets compiler warnings in loops like for (int64_t i=0; i < x.size(); i++) of comparing signed and unsigned integer, so one is forced to rewrite to for (uint64_t i=0; i < x.size(); i++).

SIZE returns an integer, I would not change that. And no DO loops with unsigned variables, and explicit casts only.

Unsigned integers cannot be treated as a range-limited version of signed ones because their range of values is not a subset of the signed integers range. Neither signed, nor unsigned integers are subtypes of each other. For example -128 <= i8 <= 127 but 0 <= u8 <= 255, so x: i8 with the condition of x > 0 is not equal to x: u8, because for example x=200 is representable as a u8, but not i8.

Make the conversion defined, and explicit only.

Many developers (but not all) believe that unsigned integers should be avoided, such as the Google C++ guidelines:

You should not use the unsigned integer types such as uint32_t, unless there is a valid reason such as representing a bit pattern rather than a number, or you need defined overflow modulo 2^N. In particular, do not use unsigned types to say a number will never be negative. Instead, use assertions for this.

That, I would agree with.

Unsigned integers are good for representing bitfields and modular arithmetic. Because of historical accident, the C++ standard also uses unsigned integers to represent the size of containers - many members of the standards body believe this to be a mistake, but it is effectively impossible to fix at this point. The fact that unsigned arithmetic doesn't model the behavior of a simple integer, but is instead defined by the standard to model modular arithmetic (wrapping around on overflow/underflow), means that a significant class of bugs cannot be diagnosed by the compiler. In other cases, the defined behavior impedes optimization.

Fortran has had SIZE returning a signed integer for ages. We should definitely avoid returning unsigneds there.

That said, mixing signedness of integer types is responsible for an equally large class of problems. The best advice we can provide: try to use iterators and containers rather than pointers and sizes, try not to mix signedness, and try to avoid unsigned types (except for representing bitfields or modular arithmetic). Do not use an unsigned type merely to assert that a variable is non-negative.

Again, agreed. If type conversion has to be explicit, then people will hopefully not use it just for the (non)-fun of it.

8bitmachine commented 1 year ago

subtracting two unsigned numbers can also overflow, though. To ensure correct programming some means of flagging up the carry/borrow is needed, in case of (perhaps) unexpected conditions. That seems to me to be the problem. 3-5 <3 would be interpreted correctly if a carry/borrow flag is used in the evaluation, if both are unsigned. That would also work in a do loop. Checking the size of files, etc could benefit from unsigned values, though the usual solution of using a larger max integer is adequate as files are not likely to need 31bit sizes. I would still request unsigned, but with the restrictions of not mixing them and explicit conversion if needed.

gronki commented 1 year ago

I agree with tkoenig in every point. Coming from the field of computer vision and image processing, where Fortran could fight for its large share, not having unsigned type causes a lot of pain. I would not be concerned about overflows as unsigned arithmetic is modulo arithmetics by design. I agree that no implicit conversion of any kind should occur, nor should size or other intrinsic return types be changed to unsigned. Unsigned int should be only restricted to be useful where it is needed (signal processing and data storage). Currently, using oversized data type causes half of the memory to be effectively wasted, and leads to computational overhead.

Dominik

pt., 13 sty 2023, 16:48 użytkownik 8bitmachine @.***> napisał:

subtracting two unsigned numbers can also overflow, though. To ensure correct programming some means of flagging up the carry/borrow is needed, in case of (perhaps) unexpected conditions. That seems to me to be the problem. 3-5 <3 would be interpreted correctly if a carry/borrow flag is used in the evaluation, if both are unsigned. That would also work in a do loop. Checking the size of files, etc could benefit from unsigned values, though the usual solution of using a larger max integer is adequate as files are not likely to need 31bit sizes. I would still request unsigned, but with the restrictions of not mixing them and explicit conversion if needed.

— Reply to this email directly, view it on GitHub https://github.com/j3-fortran/fortran_proposals/issues/2#issuecomment-1382041294, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4NA3OZEVI4I6KUZ3R6YEDWSF2OTANCNFSM4JBFTXZA . You are receiving this because you were mentioned.Message ID: @.***>

tkoenig1 commented 1 year ago

@certik : I share your concerns about people making stupid mistakes because it is too easy to confuse signed and unsigned ints. To alleviate this, maybe this:

Unsigned constants should also be distinguished from normal integers. Anybody who wants -1 as an unsigned number should either write something like u_int(-1) or u_int(-1,KIND_NUMBER). If that is too cumbersome, maybe another suffix could be added, maybe something like #.

So, u = -1# would be fine, u = -1 not (if u is an unsigned number). That should alert both readers and writers of programs that something unsigned is going on there.

Comparisons between signed and unsigned could actually be permitted, but the values would be compared, so -1 < u would always be true. This would correct one of C's worst abominations.

klausler commented 1 year ago

Unsigned integers don't have to be a full-fledged type in Fortran; it just needs a few more unsigned integer operations. Just as IAND can be well-defined on integers, so could IUADD and IULT, &c.

gronki commented 1 year ago

I think idea by Peter Klausler is also worthwhile and much more simple to introduce, since adding intrinsic is much less work intensive. I would add that conversion from uint to float is a very common operation. So simple arithmetic and conversion would cover most of the needs as far as the computer vision goes.

Dominik

pt., 13 sty 2023, 18:12 użytkownik Peter Klausler @.***> napisał:

Unsigned integers don't have to be a full-fledged type in Fortran; it just needs a few more unsigned integer operations. Just as IAND can be well-defined on integers, so could IUADD and IULT, &c.

— Reply to this email directly, view it on GitHub https://github.com/j3-fortran/fortran_proposals/issues/2#issuecomment-1382151533, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4NA3J2WFJPG7CHISYU3OTWSGEGXANCNFSM4JBFTXZA . You are receiving this because you were mentioned.Message ID: @.***>

tkoenig1 commented 1 year ago

Unsigned integers don't have to be a full-fledged type in Fortran; it just needs a few more unsigned integer operations. Just as IAND can be well-defined on integers, so could IUADD and IULT, &c.

Possible, of course, but it would not be very much like "Formula Translation" any more, it would look more like LISP :-)

Another possibility would be an intrinsic module which exports an otherwise opaque type which just happens to have certain operations, and others not. But this would not work seamlessly with I/O, so better not.

klausler commented 1 year ago

Formatted I/O already has BOZ editing, so add U editing for unsigned decimal. (List-directed and NAMELIST I/O would require a new type, yes.)

I think that people are underestimating the compiler engineering effort needed to extend Fortran's type system with a new intrinsic type, and overestimating the benefit to be gained from the effort. But adding just some unsigned integer operations would be cheap.

certik commented 1 year ago

@tkoenig1 thanks for thinking and having a plan for some of the pitfalls. It seems you agree it's a good idea to have the unsigned integer not automatically interoperable with the default signed integers, all casting must be explicit. In that case you almost just need the operations to be defined. As a start, we can do what @klausler suggested, just a few more intrinsics that operate on signed integer as a data type. And you can do that already, just write your own functions to do the operations, to get started.

Regarding a fully fledged type (so that you can use +, -, operators), let's discuss the use cases. Two were proposed by @gronki:

data storage. I use signed integers for data storage of the same size. For example I use signed 32 bit to store 32bit of (unsigned int). So there is no waste. There might be some issues with C interop, but in that case we should fix that.
signal processing --- can you give an actual example?

I am aware of image processing, where you store the image pixel data in an unsigned integer, say 0-255, and do various operations on it. I typically use signed 8-bit integer. Then we just need the operations, like converting to a float correctly (so 200 converts into 200.0 and not some negative number), and so on. It seems often you do the actual arithmetic (with +, - operators) on the floating point. So it would not be that unreadable to just call IUADD if you want to do arithmetic on the unsigned integer itself (represented by a signed integer).

tkoenig1 commented 1 year ago

Formatted I/O already has BOZ editing, so add U editing for unsigned decimal. (List-directed and NAMELIST I/O would require a new type, yes.)

That would not be needed. G editing also deals with different types, so the type has to be passed to the I/O routine independent of the format descriptor. Unsigned integers can just use what is available for integers (I,G,Z, etc).

I think that people are underestimating the compiler engineering effort needed to extend Fortran's type system with a new intrinsic type, and overestimating the benefit to be gained from the effort. But adding just some unsigned integer operations would be cheap.

Actually, I think I have a fair idea of what it would entail. It would be a big project, but overall not insurmountable, at least for the compiler that I am familiar with. And yes, if there is a new set of unsigned types,I would like the whole support of intrinsics that Fortran has to offer - I would like to have all the array intrinsics available, for example. Having a datatype in Fortran without MAXLOC or FINDLOC would take away 2/3 of the fun - might as well use C then.

tkoenig1 commented 1 year ago

Regarding a fully fledged type (so that you can use +, -, operators), let's discuss the use cases.

Accessing file formats, images and signal processing were already mentioned. Apart from that: data compression. Cryptography. Checksums. Long integer arithmetic. Control. Anything interfacing with the real world via an A/D or D/A converter.

Just one example from the above: If you try to express an add with carry in Fortran, you have a hard time doing so without resorting to integer overflow, which is prohibited. In C, you do, with unsigned a, b, c,

  c = a + b
  if (c < a) {
    /* Carry was generated /*
  }

Plus, I have to say that C interop is somewhat dodgy without actually having unsigned ints.

Consider gfortran: It will generate C prototypes for you for C-interoperable procedures, which you can then include in your C program to check consistency. This works great as long as you can write your C program yourself and can chose signed integer types. If it is a function that you cannot change and that has an unsigned instead of an int, the prototypes become incompatible, and the major use case, automatic checking, is lost.

It is a nuisance.

certik commented 1 year ago

Yes, having unsigned ints in C interop is important and we should fix that.

Regarding code like this:

  c = a + b
  if (c < a) {
    /* Carry was generated /*
  }

You would simply write it using the proposed or existing unsigned int intrinsics, so:

c = iuadd(a, b)
if (iuless(c, a)) {
    /* Carry was generated /*
}

I think the only nuisance left is that you can't use the + and <, as well as the fact that a, b, c are declared as signed integers, so the compiler won't generate a warning if you mix signed and unsigned operations.

I think if we are careful to never mix signed and unsigned, then I think it might avoid must of the pitfalls above.

tkoenig1 commented 1 year ago

I think the only nuisance left is that you can't use the + and <, as well as the fact that a, b, c are declared as signed integers, so the compiler won't generate a warning if you mix signed and unsigned operations.

If we restrict ourselves to these functions, then Fortran's capabilities regarding unsigned integers would be poor end more error prone even compared to C. I certainly would keep on using C tasks which require unsigned integers, then. Why use Fortran if you cannot even use MAXVAL?

FortranFan commented 1 year ago

I think the only nuisance left is that you can't use the + and <, as well as the fact that a, b, c are declared as signed integers, so the compiler won't generate a warning if you mix signed and unsigned operations.

If we restrict ourselves to these functions, then Fortran's capabilities regarding unsigned integers would be poor end more error prone even compared to C. I certainly would keep on using C tasks which require unsigned integers, then. Why use Fortran if you cannot even use MAXVAL?

I agree entirely with these comments by @tkoenig1 .

One should NOT and must NOT think of introducing unsigned integers (with good guardrails for their use) as cost and too much effort and so forth. It is not cost, period.

Rather, the introduction of a few features such as unsigned integers are truly a service to a language, the dividends of which will be realized by countless practitioners of Fortran in immeasurable and invaluable ways globally.

Adding such facilities in a full-featured manner is an imperative to the use of Fortran in what has become its only domain, scientific and technical computing. Why? Because the greatest challenges ahead in scientific and technical computing have to do with basic processing of data - often in sets of 8 bits each with no need for the sign bit. And massive amounts of data that are growing by leaps and bounds each day. If one has to resort to other languages and platforms for even basic processing of data, then there is absolutely no need for Fortran. Everyone might as well pack up and shutter down any Fortran language enterprise.

Some of the comments upthread could not have been written better by someone with a malicious intent of obsoleting Fortran.

klausler commented 1 year ago

It is not cost, period.

If you were to someday write the papers for this feature, you would discover that they required time and effort. If you were to someday craft the edits necessary to define the semantics of unsigned integers in every sentence of the standard that pertains to current integers, you might notice that they required some time and effort. If you were to iteratively refine a prototype implementation of this feature to ensure that the edits were complete and consistent, you might notice that you've done a lot of thinking and typing and debugging. If you were to architect the changes necessary in a production implementation's parser, symbol table, semantic analyzer, expression folding, optimizer, code generator,, and runtime support library, and then write the code and tests and documentation, you'd probably notice at least that there's other things that you were not able to do in the meantime, since you'd have spent at least a year's worth of thinking, typing, debugging, and code reviews. If you were to pay other people to do any of these things, perhaps you would notice the debits on your account.

Until you have exploited one of these educational opportunities, however, please don't discount the costs paid by those of us who are actually contributing quite a lot of time and effort and opportunity and money to make Fortran better, because it's pretty difficult and expensive work. It is the diametric opposite of "obsoleting Fortran". I wouldn't have spent the last five years struggling to bring a modern solid correct complete implementation of this language into life if I didn't think that most of its users deserved to benefit from the effort.

Costs matter. At the very least, they help determine priorities and schedules. For a big-ticket item like pervasive unsigned integers in the language, it is not helpful to pretend that it's not creating a lot of work for a lot of (other) people. It may well be worth the effort, but that's because the benefit justifies the cost, not because the cost is zero.

FortranFan commented 1 year ago

It is not cost, period.

This does not in any way mean the effort to add unsigned integers is low, my point is it is rather pointless and foolish to see the introduction of this feature as cost.

In my estimation, it is 1, Around 1,500 hours of effort to complete the Edits to the Standard. This is a conservatively rounded up estimate. 2, About 5X the effort with the new enumeration type in Fortran 2023 i.e., about five times the effort it will require compilers to implement the new intrinsic type introduced in Fortran 2023.

An approximate dollar equivalent of above effort is US$1.2MM across 8 different compiler implementations. The benefits are easily far, far greater than this: an industry I am part of is currently committed to US$50,000MM (US$ Fifty Billion) investment toward the energy industry and the move away from fossil fuels to invest in carbon capture and green energy technologies requiring massive chemical and manufacturing processing where traditionally Fortran codes have played a strong role. Just one successful use case such as improved reactor design and operation via enhanced imaging and integrated analyses with other computations where the Fortran codes make use of unsigned integers will grant to humanity 10x the value of the effort expended to develop the feature, even if there is considerable error in the estimates in 1 and 2 above. But I digress..

The fact is I do think the facility to introduce unsigned integers as a proper intrinsic type to Fortran will require appreciable effort, but it's a one-time thing and those processors with good compiler architecture designs and also with a companion C processor (which is almost all of them out there) will already have the "wiring" in place to implement the standard feature with relative ease i.e., with lower effort than what I indicate in point 2 above.

As to point 1, a reasonably sized subgroup focused on this effort can accomplish much of the technical tasks following a few online sessions and then some offline effort. A lot of the Edits work can easily be crowd-sourced and completed by the online Fortran community following the technical development. The official Fortran committee can then review and make corrections which will be far less onerous on them than their prior work practices. But even otherwise, the committee can achieve this, it is far less daunting than what became Fortran 2003.

Now, all the readers here must keep in mind in the WG5 survey initiated in 2017 on new features of interest to the practitioners for Fortran 202X, the request for unsigned integer facility in Fortran was ranked FOURTH with a score of 4.48, not too far behind the top four.

Thus there is no need to insult the needs of these practitioners, to question their intelligence and their assessment of benefits, and then to offer them half-baked solutions, The Fortran practitioners will deserve a far better product, short of which many will be forced to vote with their feet.

The bottom-line is a lot of the use cases of interest to the practitioners will not be met with workarounds involving the kind of intrinsic functions only solutions mentioned upthread.

tkoenig1 commented 1 year ago

If you were to someday write the papers for this feature, you would discover that they required time and effort.

Agreed. This is an effort that I would be willing to share (although I am not sure how to submit a paper to J3 as a non-member, yet (but I'm sure I would find somebody who would help me if the need should arise).

If you were to someday craft the edits necessary to define the semantics of unsigned integers in every sentence of the standard that pertains to current integers, you might notice that they required some time and effort.

It would probably be necessary to look at each sentence mentioning, but a strategy could be to treat an unsigned integer as a totally separate data type, and only add sentences where they apply to unsigned integers only.

The central idea of the proposal for unsigned integers would be that they are isolated as far as reasonable from other operations, to keep the user from shooting himself in the foot. This also isolates the places in the standard where they need to be mentioned.

For example, adding the relevant types in Table 10.2, "Type of operands and results for intrinsic operators" would have to be extended with the relevant lines for U (unsigned integer).

If you were to iteratively refine a prototype implementation of this feature to ensure that the edits were complete and consistent, you might notice that you've done a lot of thinking and typing and debugging.

Obviously.

If you were to architect the changes necessary in a production implementation's parser,

In the following, I am talking about what I would expect from my experience of working on and off, as a volunteer, for a bit more than 15 years gfortran.

The main change to a the parser would be to introduce a suitable suffix for integers. "U" would come to mind. That is not a particularly large change, since such syntax does not introduce any ambiguities.

symbol table,

You would need a new type, again not a big change.

semantic analyzer

This would mostly be checking that unsigned integers are only involved in operations and assignments with other unsigned integers. If somebody wants to convert, let them use either TRANSFER or an appropriate conversion function or something similar. There would probably need to be intrinsics like U_CEILING and U_FLOOR. Boz constants are another thing to be looked at.

expression folding,

Jep. That would take some care, but a lot could be handled by imitating what is done for integers.

optimizer,

Not sure what would need to be optimized. If the middle end can handle C, it should optimize Fortran unsigned integers just fine.

code generator,

Sure. That is one of the things that is a bit arcane about gcc. I'd have to look up how it is done in the C front end. Array intrinsics which are inlined can probably best be handled by a few if statements for the standard versions.

and runtime support library

Two major things to consider there: Formatted I/O and array intrinsics. Formatted I/O should be straightforward, mostly copied over from integer handling.

Regarding array intrinsics, I don't know how this is handled in the compiler you work with. With gfortran, it's mostly some m4 hackery. Not pleasant to write, but I've done this kind of thing before, and adding a bunch of types is less complex than, let's say, adding FINDLOC, which I have done.

and then write the code and tests and documentation,

If this was in the standard, then there would hardly need to be any documentation :-) But you were, of course, talking about a reference implementation.

Test cases would be required, sure.

you'd probably notice at least that there's other things that you were not able to do in the meantime, since you'd have spent at least a year's worth of thinking, typing, debugging, and code reviews.

That does not match what I expect from my own work on gfortran; from the outline above, I think that this would take far less time, even for a volunteer like me.

If you were to pay other people to do any of these things, perhaps you would notice the debits on your account.

If we were able to pay people for gfortran work, the compiler would be much better :-)

[not addressed to mi]

Until you have exploited one of these educational opportunities, however, please don't discount the costs paid by those of us who are actually contributing quite a lot of time and effort and opportunity and money to make Fortran better, because it's pretty difficult and expensive work.

What I did (and do) to improve Fortran is to donate my own time.

It is the diametric opposite of "obsoleting Fortran". I wouldn't have spent the last five years struggling to bring a modern solid correct complete implementation of this language into life if I didn't think that most of its users deserved to benefit from the effort.

I suppose that is a dig at gfortran, is it? :-)

Of course, I would have preferred for more gfortran contributors :-)

Costs matter. At the very least, they help determine priorities and schedules. For a big-ticket item like pervasive unsigned integers in the language, it is not helpful to pretend that it's not creating a lot of work for a lot of (other) people.

A central point of this proposal is to make it as un-invasive by isolating unsigned integers as far as possible. This mostly should help users not to make stupid mistakes, but could also make compiler writers do less work.

It may well be worth the effort, but that's because the benefit justifies the cost, not because the cost is zero.

The cost is certainly not zero. However, I think that this is a worthwhile addition to the language.

As far as a reference implementation is concerned: I am currently not volunteering to write one as long as there is no indication that there is a good chance that this will be implemented. There is also the PARAMTER X=5 trap, that creating such an implementation will be incompatible with a later standard.

What would be the best path forward?

certik commented 1 year ago

Yes, there is cost to any new feature, see here: https://fortran-lang.discourse.group/t/cost-of-adding-any-new-feature-to-the-fortran-language/1479.

Right now we are debating how (or if) to best incorporate unsigned integers into Fortran; the question of cost is part of it, but I would discuss it separately, like a "label" or "price tag" on each proposal, how costly it is to implement it.

I think the only nuisance left is that you can't use the + and <, as well as the fact that a, b, c are declared as signed integers, so the compiler won't generate a warning if you mix signed and unsigned operations.

If we restrict ourselves to these functions, then Fortran's capabilities regarding unsigned integers would be poor end more error prone even compared to C. I certainly would keep on using C tasks which require unsigned integers, then. Why use Fortran if you cannot even use MAXVAL?

So I think you argue that if we are going to do unsigned integers, to go all the way and also introduce dedicated types and overload intrinsics like maxval.

So it seems we have two proposals:

Implement the operations, but do not introduce a dedicated type (probably not too difficult)
Implement the operations and dedicated types and overloads for arithmetic and some intrinsic functions (more difficult)

It seems we agree we will not allow any automatic casting and comparisons of signed and unsigned.

tkoenig1 commented 1 year ago

It seems we agree we will not allow any automatic casting and comparisons of signed and unsigned.

Automatic type conversions: Yes.

Comparisons: A literal interpretation of table 10.7, "Interpretation of the relational intrinsic operators", would suggest that comparison of a signed and an unsigned value would take the sign of the signed operand into account, so that the expression i > u would always be false if i is negative. IIRC, Bob Corbett mentioned on comp.lang.fortran some time ago that Sun Fortran, which had unsigned integers as an extension, used this.

gronki commented 1 year ago

Totally unrelated question, although inspired by this topic: would organizing a workshop (or even a webinar) on how to work with Fortran standard in introducing new feature be a good idea? In the example of unsigned integer, the community proposal could attach a complete diff to the standard text, therefore taking off the workload of trying to incorporate the new feature into the language. There are plenty of people in the amazing Fortran community that flourished over the last few years thanks to efforts of Ondrej and others that would be competent to do so. Of course, this would be practical for simpler features, nevertheless I imagine this could help speed up the development cycle of the language and get it closer to 2-3 years, which is closer to what is needed by us Fortran developers.

Dominik

śr., 18 sty 2023 o 08:15 tkoenig1 @.***> napisał(a):

It seems we agree we will not allow any automatic casting and comparisons of signed and unsigned.

Automatic type conversions: Yes.

Comparisons: A literal interpretation of table 10.7, "Interpretation of the relational intrinsic operators", would suggest that comparison of a signed and an unsigned value would take the sign of the signed operand into account, so that the expression i > u would always be false if i is negative. IIRC, Bob Corbett mentioned on comp.lang.fortran some time ago that Sun Fortran, which had unsigned integers as an extension, used this.

— Reply to this email directly, view it on GitHub https://github.com/j3-fortran/fortran_proposals/issues/2#issuecomment-1386593335, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4NA3NKTHDLA3G4CE3MPU3WS6KBHANCNFSM4JBFTXZA . You are receiving this because you were mentioned.Message ID: @.***>

certik commented 1 year ago

IIRC, Bob Corbett mentioned on comp.lang.fortran some time ago that Sun Fortran, which had unsigned integers as an extension, used this.

Can you find out exactly what Sun Fortran did? Let's learn from them.

tkoenig1 commented 1 year ago

IIRC, Bob Corbett mentioned on comp.lang.fortran some time ago that Sun Fortran, which had unsigned integers as an extension, used this.

Can you find out exactly what Sun Fortran did? Let's learn from them.

https://docs.oracle.com/cd/E19205-01/819-5263/aevnb/index.html

certik commented 1 year ago

Thanks, this is good. This is how they handle comparisons:

Signed and unsigned integer operands may be compared using intrinsic relational operations. The result is based on the unaltered value of the operands.

It probably means that it takes the sign correctly into account, as expected, so there shouldn't be any surprises.

tkoenig1 commented 1 year ago

Let's go through the Oracle documentation. I'm snipping a bit where parts do not apply, and section headings.

The Fortran 95 compiler accepts a new data type, UNSIGNED, as an extension to the language. Four KIND parameter values are accepted with UNSIGNED: 1, 2, 4, and 8, corresponding to 1-, 2-, 4-, and 8-byte unsigned integers, respectively.

The KIND numbers would obviously be processor-dependent. What this means is that there is a KIND for each corresponding integer.

The form of an unsigned integer constant is a digit-string followed by the upper or lower case letter U, optionally followed by an underscore and kind parameter.

Looks good.

Binary operations, such as + - / cannot mix signed and unsigned operands. That is, UN is illegal if U is declared UNSIGNED, and N is a signed INTEGER.

Use the UNSIGNED intrinsic function to combine mixed operands in a binary operation, as in U*UNSIGNED(N)

Also sounds good.

An exception is when one operand is an unsigned integer and the other is a signed integer constant expression with positive or zero value; the result is an unsigned integer.

I would probably not have invented that, but if it's prior art, why not?

The kind of the result of such a mixed expression is the largest kind of the operands. Exponentiation of a signed value is signed while exponentiation of an unsigned value is unsigned. Unary minus of an unsigned value is unsigned. Unsigned operands may mix freely with real, complex operands.

All looks good to me.

Signed and unsigned integer operands may be compared using intrinsic relational operations. The result is based on the unaltered value of the operands.

What I proposed from what I remembered from Bob Corbett.

The CASE construct accepts unsigned integers as case-expressions. Unsigned integers are not permitted as DO loop control variables, or in arithmetic IF control expressions.

Makes sense.

Unsigned integers can be read and written using the I, B, O, and Z edit descriptors. They can also be read and written using list-directed and namelist I/O. The written form of an unsigned integer under list-directed or namelist I/O is the same as is used for positive signed integers. Unsigned integers can also be read or written using unformatted I/O.

Unsigned integers are allowed in intrinsic functions, except for SIGN and ABS. A new intrinsic function, UNSIGNED, is analogous to INT but produces a result of unsigned type. The form is UNSIGNED(v [,kind] ). Another new intrinsic function, SELECTED_UNSIGNED_KIND( var), returns the kind parameter for var.

All that looks plausible.

Intrinsic functions do not allow both signed and unsigned integer operands, except for the MAX and MIN functions, which allow both signed and unsigned integer operands only if there is at least one operand of REAL type.

That is a bit strange. If MAXand MIN could be calculated according to the rules for relational operators, or mixing could be simply disallowed.

Unsigned arrays cannot appear as arguments to array intrinsic functions.

That would have to be extended, I would want MAXLOC etc.

What is missing (the docs document a F95 compiler) is:

C interop. I would there propose a C_UINT kind to match a single UNSIGNED variable, plus C_UINT8_T etc. This would fix the quirkiness of Fortran's interop.

ISO_FORTRAN_ENV. That would get an UNSIGNED_KINDS array, plus UINT8, UINT16, UINT32 and UINT64 kinds.

Boz literal constants. Hm, have to think about that a bit more.

certik commented 1 year ago

Ok, let's revisit your answer to my "riddle" above:

Consequently 3 - 5 < 3 is false, which can easily lead to many bugs in a code

Again, this is the nature of modulo arithmetic. It would be the expectation that people who use it know what they are doing. Maybe this can be alleviated by chosing some more descriptive name which has the modulo in the name.

3-5 evaluates to -2 (signed!), so 3-5 < 3 is True.

3u_1 - 5 < 3 is not allowed (although with the exception above the LHS would be allowed, and evaluate to 254u_1 ?)

3u_1-5u_1 < 3 is allowed and the LHS evaluates to 254u_1, and the comparison evaluates to False.

tkoenig1 commented 1 year ago

... but 3u - 5u > 3u :-)

tkoenig1 commented 1 year ago

proposal.md

Here's a first shot at a proposal

certik commented 1 year ago

Thanks @tkoenig1. In C and I believe in Fortran also (although I can't find it right now in the standard), signed integer overflow is undefined. In C, unsigned integer overflow is defined to wrap around using modulo 2^N where N is the number of bits.

Should Fortran define the unsigned overflow behavior to be modulo 2^N? If so, you should put it into the proposal.

I would disallow 3u-5 < 3 as well as 3u-5u < 3. I would only allow 3u-5u < 3u. That way there is no casting from signed to unsigned happening and there is no comparison of signed and unsigned.

Personally I find it confusing that 3u-5u < 3u evaluates to .false., and that 3u-5u is 254u. Unsigned integers are using modular arithmetic. In math, they sometimes use ⊕ instead of + and ≡ instead of =. Logical types in Fortran are a boolean algebra, arithmetic modulo 2, and it uses .and. and .or. instead of + and -. For comparison, you use .eqv. instead of =. So we could consider using different operators for modular arithmetic, to make it obvious that this is modular arithmetic, with different rules. For this reason, it would be good for me to look at use cases, where this is actually used in practice. I looked at some C image manipulation code, such as https://github.com/nothings/stb/blob/8b5f1f37b5b75829fc72d38e7b5d4bcbf8a26d55/stb_image_resize.h, and they represent pixel values using unsigned integers, but seem to do all actual arithmetic using either signed integers or floats. @gronki do you have some sample code for image manipulation in C or C++ that we could use as an example to see how to do this in Fortran?

FortranFan commented 1 year ago

@tkoenig1 wrote Jan. 19, 2023 3:31 PM EST:

proposal.md

Here's a first shot at a proposal

@tkoenig1 ,

You may want to look at the links below for what some on the J3 committee have considered with a BITS type which is more in line with what @certik advises in the previous comment re: not overloading + and - operations, etc.: https://j3-fortran.org/doc/year/22/22-195.txt https://j3-fortran.org/doc/year/07/07-007r2.pdf

tkoenig1 commented 1 year ago

@tkoenig1 wrote Jan. 19, 2023 3:31 PM EST:

proposal.md Here's a first shot at a proposal

@tkoenig1 ,

You may want to look at the links below for what some on the J3 committee have considered with a BITS type which is more in line with what @certik advises in the previous comment re: not overloading + and - operations, etc.: https://j3-fortran.org/doc/year/22/22-195.txt https://j3-fortran.org/doc/year/07/07-007r2.pdf

This is about a BITS data type. I think that Fortran has (almost) ample facilities with IEOR and friends (a bit dodgy because IEOR can generate numbers that are not Fortran model numbers) but OK.

tkoenig1 commented 1 year ago

Personally I find it confusing that 3u-5u < 3u evaluates to .false., and that 3u-5u is 254u.

OK. I have dropped that in the attached proposal, mostly because it would be confusing for people who today use languages like C.

Unsigned integers are using modular arithmetic. In math, they sometimes use ⊕ instead of + and ≡ instead of =. Logical types in Fortran are a boolean algebra, arithmetic modulo 2, and it uses .and. and .or. instead of + and -. For comparison, you use .eqv. instead of =. So we could consider using different operators for modular arithmetic, to make it obvious that this is modular arithmetic, with different rules.

I don't think this is a good idea, because it would be confusing to people who today write C, and who we want to win over to Fortran.

proposal-v2.md

I have modidifed the proposal accordingly and attached it.

FortranFan commented 1 year ago

@tkoenig1 wrote Jan 20, 2023 1:23 AM EST:

..

You may want to look at the links below for what some on the J3 committee have considered with a BITS type which is more in line with what @certik advises in the previous comment re: not overloading + and - operations, etc.: https://j3-fortran.org/doc/year/22/22-195.txt https://j3-fortran.org/doc/year/07/07-007r2.pdf

This is about a BITS data type.

The BITS data type proposal is effectively an integer type without the sign bit. How do you suppose it is any different than what you propose from the type aspect? The first comment in a committee, should your proposal get tabled, will be about this BITS type and the committee should work on the BITS type that already advanced to the draft standard and was pulled out for nontechnical reasons.

tkoenig1 commented 1 year ago

The BITS data type proposal is effectively an integer type without the sign bit.

Looking at the pdf you linked: BITS are restricted to logical operations, there are no arithmetic operations defined. "Table 7.3: Type of operands and results for intrinsic operators" has only integer, real and complex defined for Unary + and -, and +, -, *, / and **.

So, BITS is what it says: Operations for a bag of bits. Unsigned integers, on the other hand, are an arithmetic type, on which users perform arithmetic operations.

FortranFan commented 1 year ago

BITS is what it says: Operations for a bag of bits. Unsigned integers, on the other hand, are an arithmetic type, on which users perform arithmetic operations.

Exactly what I wanted to read! It was a leading question.

In my mind, all the prior effort that led to the development of the BITS type which nearly made it into the standard and the discussion in this thread around the use cases including the comments by @tkoenig1 suggest the need for TWO distinct types in Fortran:

unsigned integer, I suggest calling it UINTEGER (noting of course the uppercase is only for visual/readability reasons here; it doesn't matter per Fortran syntax)
BITS type along the lines of the prior proposal as linked upthread toward Fortran 2008

I think it will be good to include BOTH the above types into the standard at the same time, there will be considerable economy of scale in doing so, both in terms of standard development as well as compiler implementations. And having two distinct types will really help the Fortran practitioners in their codes for different use cases.

certik commented 1 year ago

@tkoenig1 can you show some use cases (say in C or C++) for the unsigned integer with arithmetic? The use cases where arithmetic is not needed might be covered by the BITS type. So far you listed the following use cases: "image processing, binary file I/O, signal processing, data compression and multi-precision arithmetic.". It seems for many of these, the unsigned integer is used more like BITS. But I want to understand examples of usages where arithmetic is needed, to figure out if to overload +,-,*,/ and relational operators.

For example, for multi-precision arithmetic, most of the operations seem to be bitwise. After about 10 minute search, I found this function that uses arithmetic on unsigned ints: https://github.com/ShiftMediaProject/gmp/blob/7f542afb33c0808b39abc2e43cba7e22a5de3c4a/mpz/pprime_p.c#L150. I recommend taking some functions like this, and create an equivalent Fortran function using our various proposals above, to see how they would look like. Important: this function does not use the wraparound feature of unsigned ints. If wraparound is not needed, then compilers can provide a runtime check for overflow, and then indeed overloading arithmetic operators is fine, as indeed it behaves exactly as signed integers -- just range limited, checked at runtime with a compiler check in Debug mode. Most of the pitfalls come from the unexpected wraparound.

j3-fortran / fortran_proposals

Unsigned integers #2