vishapoberon / compiler

vishap oberon compiler
http://oberon.vishap.am
GNU General Public License v3.0
186 stars 25 forks source link

Compiler bug with large constants #70

Open markmpx opened 5 years ago

markmpx commented 5 years ago

MODULE bug; VAR x : INTEGER; BEGIN x := 0E700000H; ( compiles ) x := 0E7000000H; ( wont compile - err 113 incompatible assignment ) ( but it should compile as the hex constant is 4 bytes = same size as INTEGER ) END bug.

I believe this is due to a bug in PROCEDURE IntSize in OPT.Mod (Line 227 approx) as IntSize(0E7000000H) returns 5 (should be 4). Upon inspection of IntSize i see that IntSize(-1) will also return strange results.

markmpx commented 5 years ago

Please excuse my ( comments not being matched ) in the example above.

dcwbrown commented 5 years ago

I was surprised by this too.

Your hex value 0E7000000H (decimal ‭3875536896‬) is greater than MAX(INTEGER) - the largest possible 4 byte integer value 2147483647 - because INTEGER is signed.

Thus IntSize is correct: the positive value 0E7000000H does indeed require at least 5 bytes.

The unsigned bit pattern E7000000 interpreted as a two's complement signed 4 byte integer represents ‭-419430400‬ decimal or -19000000 hexadecimal.

So you could code -19000000H.

I believe you could also use 0E7H*1000000H, or ASH(0E7H, 24).

markmpx commented 5 years ago

Thanks very much for your prompt response. I believe i see the reasoning now. I do understand unsigned and signed (2's complement). It just never occurred to me that it could be interpreted this way.

I encountered this while attempting to compile the latest Oberon 07 compiler from Project Oberon.

I did have another query but was reluctant to post anything as its not an issue. Is there a more appropriate way to ask a question ? I am no github expert, and i promise not to pester you with more than the odd question.

dcwbrown commented 5 years ago

You're welcome.

Ah, so Project Oberon can compile this, and it is because it has only one size of integer (and BYTE which isn't an issue because it is unsigned).

Indeed Vishap can compile the similar case for LONGINT.

I did look into improving this a while back - I wondered if there was a way for the compiler to spot that the eventual target variable is INTEGER, but things don't happen at the right times to do so, and the logic wouldn't work if there as any expression involved. i.e.: does 0FFFFFFFFH in the middle of an expression stand for -1 or for 2^32-1? Some cases could be handled, but the documentation describing which cases act in which ways would be difficult to understand and the complexity is out of place compared with the simplicity of Oberon elsehere.

Another solution would be to have a different suffix for unsigned - perhaps 'H' for signed and 'UH' for unsigned. Far less complicated than special case expression rules. But project Oberon code would still need manual changes.

Yes, issues is a good way to ask a question too.

-- Dave.

markmpx commented 5 years ago

Dave,

While working on "Project Oberon" and derivatives, i have to manually convert any hex constants >08...H to use -0...H which is painful. I need to write a separate program just to figure these out.

You wrote above:

  1. So you could code -19000000H.
  2. I believe you could also use 0E7H*1000000H, or ASH(0E7H, 24).

Only option 1. works, 2. does not work.

Any chance you would consider adding your proposed UH to allow it to accept these ?

dcwbrown commented 5 years ago

I have a possible solution for you in the 'hexextend' branch.

I found that the compiler behaviour needs to depend on what size the coded constant is targetted at. I have therefore modified the syntax simply by adding a digit 1, 2, 4 or 8 after the H, being the number of bytes.

Thus in the case above, E0000000 is intended for a 4 byte integer, so please code it as

0E0000000H4

Would you like to try this and let me know what you think?

markmpx commented 5 years ago

Dave,

Yes i will give it a go. (Also i get to learn more about git) Thanks.

Oleg-N-Cher commented 5 years ago

Dear Dave,

Long hexadecimal numbers in Component Pascal are defined with suffix 'L' as:

0FFFFFFFFL = 4294967295
0FFFFFFFFH = -1

It will probably be better than 0FFFFFFFFH4 and 0FFFFFFFFH8.

I will give only three arguments for such decision.

  1. Compatible with the most developed and used Oberon dialect (Component Pascal).

  2. As we know, in the old Oberon/Oberon-2 there is no 64-bit integer type at all, since LONGINT = 32 bits. And such decision will lead to the fact that all hexadecimal numbers described in programs in the old Oberon will remain compatible with the suffix format 'H' instead of 'H4'. That will allow us to have greater compatibility.

  3. The ease with which you implement such a solution breaks Oberon's traditions and complicates the language itself (and making it non-standard) and its description.

I hope you will heed my words. Thanks.

Oleg-N-Cher commented 5 years ago

And, as you know, in Oberon/Oberon-2 the real numbers of single precision are written 2.44E15, and double precision 2.44D15

In Component Pascal, there is no longer the suffix 'D', since there are no single-precision numbers -- all calculations are performed in double. But still, by analogy with this solution in Oberon and Oberon-2, to use the suffix 'L' for long hexadecimal literals seems to be a reasonable solution, only expanding the range of numbers, but not breaking the old code with 32-bit literals.

dcwbrown commented 5 years ago

Good points Oleg.

So do I understand right?: in Component Pascal,

So I am OK with making H and L behave like this when the -OC option is set (compile with Component Pascal integer sizes).

But what to do when -O2 is set? In that case INTEGER is 16 bit, LONGINTis 32 bit and HUGEINT is 64 bit?

And I would really want a way of coding a constant that works whether compiling with -OC or with -O2.

Oleg-N-Cher commented 5 years ago

Yes, Dave, you got it right.

There is no solution to this issue in the world of Oberon-2. I don't even know which Oberon-2 implementations support the 64-bit HUGEINT type. Have you seen how this problem is solved in Active Oberon/A2?

Also, this problem does not exist in Oberon-07, since there is no long integer type. Is it possible to define there a constant that will be of implicit type BYTE? Or doesn't that make sense?

I think we should focus on the mode -OC (btw, have you fixed the size of SET = 32 bits?). And the old mode -O2 stay only for compatibility with the old Oberon-2 code. By the way, Ofront+ allows to mix code developed in Oberon-2 and Component Pascal (subset) languages simultaneously.

By the way, there is another way to declare constants of an explicitly given type, if I remember correctly, exactly you developed it. However, it requires import the module SYSTEM, which is not always desirable.

CONST Abc = SYSTEM.VAL(INTEGER, 0FFFFFFFFH);

Oleg-N-Cher commented 5 years ago

I found out how things are with this issue in Active Oberon/A2.

CONST LongInt = 0FFFFFFFFH; (* 4294967295 *)

CONST Int = SHORT(0FFFFFFFFH); (* -1 *) (Ofront gives error "number too large")

The aggressive SHORT will turn the actual 4294967295 into -1. Apparently, the check of values in SHORT is removed for constants. And, thus, there is no need for an additional suffix 'L'.

I'm leaning more towards the decision made in A2, though the suffix 'L' will have to be maintained for compatibility with Component Pascal.

You have to make the final decision on this question.

Oleg-N-Cher commented 5 years ago

I want to note that in the description of the Oberon/Oberon-2 language the type HUGEINT is not present. Therefore, we can consider this type only as a language extension. Accordingly, we should consider the description of constants of this type as a language extension too.

By and large, we have to choose only of two solutions (we are talking about the option -O2):

  1. The suffix 'H' describes 32-bits constants of type LONGINT, as it was in Oberon/Oberon-2. Pros: compatibility with old code. To describe constants of type HUGEINT, associated with a language extension, will use a new suffix. It can be 'H8' or 'L' (as in Component Pascal). I'd prefer the suffix 'L', though. This provides an additional level of compatibility with Component Pascal. The appearance of such a suffix in programs will mean that a program uses the non-standard HUGEINT type and a non-standard language extension - the suffix 'L'. And such new code will be incompatible with old Oberon/Oberon-2 compilers.

  2. The suffix 'H' describes constants of any size (including 64 bits). Then we need an additional suffix ('H4'?) to describe constants of a truncated type . This way will break compatibility with the old code, causing the pain of incompatibility.

I'm sure that aggressive SHORT (as in A2) which turns 4294967295 into -1 is a bad idea.

In Ofront+ I decided to use these modes to describe and use hexadecimal constants:

-1, -2 (Oberon, Oberon-2): 'H' for 32-bits constants, 'L' for 64-bits constants. Compatible with old code. New code uses 'L' and HUGEINT type as the Oberon/Oberon-2 extensions.

-C (Component Pascal): 'H' for 32-bits constants, 'L' for 64-bits constants. Compatible with Component Pascal.

-3 (Oberon-3, the experimental dialect): 'H' for constants of any type. For converting a LONGINT constant to INTEGER it is offered to use SYSTEM.VAL. This is a normal solution for system conversion of a set of bits into a set of bits with losses. No aggressive SHORT at all! It's not Oberon-way!

In addition, there is another very good and portable way to describe constants at this manner:

CONST h32 = 100000000H; A1 = 0FFFFFFFFH; (* 4294967295 *) A2 = 0FFFFFFFFH-h32; (* -1 *)

norayr commented 5 years ago

I like the approach of A2 compiler. It does complicate the syntax less? The programmer needs to learn to use the technique though. It can be documented. In case of using special suffixes, we should document those as well. So, for now I guess the A2 solution is better.

Then, after the A2 approach, I am inclined towards Dave's solution. And I think the syntax should not differ in case of using CP or -2 type sizes. It just needs to be the same syntax.

I don't think that compatibility with CP should be considered a priority. We just have the mode for the CP types, it does not mean we plan to extend the compiler to support CP language, and even our standard library was not fully tested with those CP type sizes, and I guess a lot of unexpected things could happen, when using voc with CP type sizes and standard library. Unexpected things definitely happen, I faced those, when using default (-2) mode with Ulm library, because Ulm compiler type sizes are different.

I will ask people on IRC to take a look and express what they think.

Oleg-N-Cher commented 5 years ago

The approach of Active Oberon teaches the novice programmer to use SHORT for other purposes - to truncate bits. This is not the most beautiful and elegant use of SHORT.

Stefan Vorkoetter told me a story: "When I was implementing the Maple Machine on Ceres Oberon, I needed to implement bitwise operators. I asked Prof. Wirth how to manipulate the bits of an integer in Oberon. He responded, "integers don't have bits". After that he told me about the SYSTEM special functions that I needed. :-) "

Therefore, Norayr, if you do as you suggest, it will break compatibility with the old Oberon code and there will be a lot of pain. The people on IRC are lovers of Oberon-07 and their opinion is known in advance.

I hope Dave makes a smarter decision.