vlang / v

Simple, fast, safe, compiled language for developing maintainable software. Compiles itself in <1s with zero library dependencies. Supports automatic C => V translation. https://vlang.io
MIT License
35.86k stars 2.17k forks source link

check range of values at run time for ints, floats, array of ints/floats #4672

Closed cristian-ilies-vasile closed 4 years ago

cristian-ilies-vasile commented 4 years ago

Add a new keyword to the language named range used to define lower and upper bounds acceptable of that variable (int, float array of) at run time. Example mut xyz := i16(100) range 0 .. 199 mut cube_vol:= f32(0.0) range 1.0 .. 3.14159

Allow the following construct to be valid mut xyz := i16(-1) range 0 .. 199 // default value is -1 OK

The compiler could issue an error when the ranges of 2 vars are not overlapping and upper and/or lower bounds will create issues at run time mut xyz := i16(0) range 0 .. 200 mut abc := i16(0) range 0 .. 1000 ..... xyz = abc // error at compile time

dumblob commented 4 years ago

Hm, I'd say that if this shall be added, then arbitrary constraints should be implemented as well (guaranteeing arbitrary properties of the given type - here numbers and related compound types - e.g. be an odd or even number, be divisible by 99 and 777 at the same time, be by 3 greater or less than a random number from specified distribution, ...). There are infinite possibilities of constraints, yay!

Ok, now seriously. I'm convinced we should not choose one or two (in this case min&max) constraints from the infinite space and implement them in the language. And if most HW CPUs supported instructions making certain such constraints highly efficient, then I'd say intrinsics would be a better way to support them rather then cluttering the language - both its syntax and its multiplatform implementation(s).

cristian-ilies-vasile commented 4 years ago

@dumblob //0 range will be optional
//1 The Bounds checking mechanism is already in place. //2 something similar is proposed for Rust / Range types for integers (or refinement types?) #671 https://github.com/rust-lang/rfcs/issues/671 //3 range keyword is borrowed from ADA, but with a simplified syntax. //4 a special compiler flag could be used to apply range checking at run time.

cristian-ilies-vasile commented 4 years ago

case study of not using range checks https://en.wikipedia.org/wiki/Schiaparelli_EDM the lander's inertial measurement unit, which measures rotation, became saturated (unable to take higher readings) for about one second. This saturation, coupled with data from the navigation computer, generated an altitude reading that was negative, or below ground level. This caused the premature release of the parachute and back shell. The braking thrusters then fired for about three seconds rather than the expected 30 seconds, followed by the activation of ground systems as if the vehicle had already landed. In reality, it was still at an altitude of 3.7 km (2.3 mi).[60][61] The lander continued transmitting for 19 seconds after the thrusters cut off; the loss of signal occurred 50 seconds before it was supposed to land.[62] Schiaparelli impacted the Martian surface at 540 km/h (340 mph), near terminal velocity.[61]

dumblob commented 4 years ago

@cristian-ilies-vasile yes, there are numerous reasons to check a lot of properties of data. What I'm concerned of is the highly non-systemic approach of such proposals.

1) There is an infinite number of possible constraints - why to choose min & max only? There are programming languages build upon the idea of enforcing one to define the meaning of data (i.e. a full set of constraints targeting different perspectives throughout the whole execution time of the app).

2) Constraints imposed on a value do change during execution and the "default constraints" (e.g. min & max or even/odd or be_a_prime or ...) are thus highly insufficient - look e.g. at object-oriented languages (e.g. Smalltalk) which overload assignment (i.e. "write") and "read" operators & methods in runtime depending on the context and advances in the execution of the app. In other words, it doesn't make much sense to impose some very generic, global and always valid (under all circumstances) constraints and we should talk about whole life cycle of constraints instead.

3) Note, that not enforcing constraints will result in the very same losses as described in the paper because nobody will use them (and if so, then on a very few places which asymptotically gets close to zero in the whole app). In other words, to make such functionality meaningful, you need to enforce it everywhere.

cristian-ilies-vasile commented 4 years ago

why to choose min & max only Because is much easier to handle this type of constrain at run time, just check that var belongs to that linear space aka [>=min and <= max]

you need to enforce it everywhere. Nope, only where the specifications states that during the life cycle of a real world attribute can be bound to 2 limits. Exemple distance between 2 cars must be > 0

cristian-ilies-vasile commented 4 years ago

One more example for simplicity of range AS IS

struct A {
mut:
    val  int
    nums []int
}

TO BE WITH OPTIONAL RANGE

struct A {
mut:
    val  int       range -100 .. 100  
    nums []int     range    0 .. 999   
}
dumblob commented 4 years ago

why to choose min & max only

Because is much easier to handle this type of constrain at run time

That's why I'm saying its highly non-systemic. Such argument is IMHO far from being enough for a decision to implement it.

you need to enforce it everywhere.

Nope, only where the specifications states that during the life cycle of a real world attribute can be bound to 2 limits. Exemple distance between 2 cars must be > 0

I think there is a misunderstanding here. I argue that without making the difference between "unconstrained" and "constrained" value explicit, you never guarantee, that the programmer explicitly wanted that given behavior. In other words, constraint specification must not under any circumstances be optional, otherwise it's more or less as error prone as with manual assert before/after each change of the value.

And btw. the above proposal of bounds checking would require more than what it states. It'd need a special syntax for bounds checking ignoring on places where one has expressions with a mixture of bounds checked values and bounds unchecked values (typical in embedded development, where one deliberately writes a lot of code which must over/underflow as part of the algorithm).

cristian-ilies-vasile commented 4 years ago

It'd need a special syntax for bounds checking ignoring on places where one has expressions with a mixture of bounds checked values and bounds unchecked values

mut x:=int(0) range 0..1000 mut y:=int(0) range 0..1000 mut z:=int(0)
mut k:=int(0) ... k = (x+y+z) // what is wrong with this code if all the constrains are valid

typical in embedded development If one is serious about embedding programming then must use real tools based on model-based design / model-driven engineering and do not write a single line of code.

This is just a proposal, and it's up to main coders to implement it as is or change or reject it.

dumblob commented 4 years ago

k = x+y=z // what is wrong with this code if all the constrains are valid

Hm, I can't interpret this code. Did you mean k = x + y; k = z?

Let's be more specific - the problem boils down mainly to overflow/underflow and confusion. In other words mut x := 0 range -2_147_483_648..2_147_483_647 might seem different from mut x := 0, but actually both still "wrap". On the other hand, ranges themself do not wrap (if they would, they'd be actually a new integer type (which compared to other integer types would be extremely slow and inefficient).

You can spot it more easily e.g. when some ranges are rather closer to the underlying hardware instruction integer ranges - e.g.

// case A
mut a := u8( 200 ) range 0..250  // ASM instruction range 0..255
mut b := u8( 100 ) range 0..250  // - || -
mut c := u8( 0 ) range 0..250  // - || -
c = a + b  // 8bit unsigned addition

no runtime error because (200+100)-255-1=44

// case B
mut a := u8( 200 )  // ASM instruction range 0..255
mut b := u8( 100 )  // - || -
mut c := u8( 0 )  // - || -
c = a + b  // 8bit unsigned addition

no runtime error because (200+100)-255-1=44

// case C
mut a := 200 range 0..250  // ASM instruction range -2_147_483_648..2_147_483_647
mut b := 100 range 0..250  // - || -
mut c := 0 range 0..250  // - || -
c = a + b  // 32bit signed addition

finally a runtime error because (200+100)=400

Now imagine you'll get an arbitrary mixture of A, B, C. I'd wish you good luck as no sane person would want to try to explain your code it without at the same time carefully consulting the C standard, chapters about promotion & conversion rules and undefined cases for integers.

So an additional syntax would be needed to avoid this or make it explicit at least (yeah, the easiest would be the good old assert, but that'd completely defeat the purpose of having ranges :open_mouth:).

This'll get even more complex in JavaScript, of course.

cristian-ilies-vasile commented 4 years ago
  1. I did correct the expression should be k = (x+y+z) // what is wrong with this code if all the constrains are valid

  2. range will be applied only to signed integers and maybe floats not for unsigned integers.

  3. there are 2 types of overflow(underflow) a. at variable level if range is used - let's call them range overflow/underflow b. at CPU/machine level when the result of an operation exceeds the maximum value for that variable; this is independent of type a.
    I did raise a feature req. ticket to be implemented overflow/underflow for integers (https://github.com/vlang/v/issues/4017)

dumblob commented 4 years ago
  1. I did correct the expression should be k = (x+y+z) // what is wrong with this code if all the constrains are valid

This seems ok to me as the ranges seem too small to produce any undefined behavior and/or defined underflow/overflow. Though see below.

  1. range will be applied only to signed integers and maybe floats not for unsigned integers.

That's nice, but that doesn't make the situation any better as C standard has signed integers overflow/underflow as undefined behavior (i.e. allowed though undefined). I demonstrated it on examples with perfectly defined behavior just to make it easier to understand the issues. But imagine I used signed integers in my examples and you'd end up with completely undefined behavior which would be perfectly masked ("invisibly silenced") by using ranges.

  1. there are 2 types of overflow(underflow) a. at variable level if range is used - let's call them range overflow/underflow

Yes. Though I can't imagine how you'd like to implement range overflow/underflow efficiently. With this you're basically imitating hardware instructions (which are though highly optimized and burned in the CPU).

Any operation with such "soft integer" would be utterly slow. And as said above, ranges must not be optional therefore making lots of stuff (close to everything) slow.

Another option would be to not implement overflow/underflow for ranges, but that has the correctness downsides (e.g. those I showcased).

Yet another option would be to make it "on demand" (e.g. through additional syntax), but is even worse - because you probably wouldn't be able to distinguish between "wrapping" integers and "non-wrapping" integers in one expression.

b. at CPU/machine level when the result of an operation exceeds the maximum value for that variable; this is independent of type a.

Yes, it's independent technically. Though in a written code as you saw it makes judging about a very simple expression extremely difficult (both to its correctness as well as to its performance). The simplicity of programming is largely about having very few types (including classes/interfaces) with uniform and simple behavior, but ranges effectively create type explosion (each ranged integer is basically it's own type).

Finally, ranges for floats would be more sane though still lacking the systemic approach and thus feel quite alien due to not being at least a bit generalized constraint specification.

I'd even argue that inlined overloaded operators/methods of integer types performing arbitrary constraint checking (i.e. not just bounds checking) would have the same performance, would be way more readable, vastly easier to judge about and well-defined when it comes to behavior. But that's just my subjective view :wink:.

cristian-ilies-vasile commented 4 years ago

Yet another option would be to make it "on demand" Could be defined 2 knobs for the V compiler linked to --prod knob --enforce_overflow_check --enforce_range_check

-- debug mode Always create code for C/JS backend.

--prod mode Even the range "label" is used in code, without --enforce_* command line option will not be "exported" to C/JS backend.

Delta456 commented 4 years ago

I had made a module for this purpose but doesn't have all the functionality. https://github.com/Delta456/range

Delta456 commented 4 years ago

As discussed on the discord server. This will not be implemented as it adds more complexity to the language.

medvednikov commented 4 years ago

I'd actually be fine with something like Pascal's custom range types.

But they'd have to be declared separately:

type DayOfWeek = 1..7

also

type DayOfWeek = Day.monday .. Day.sunday
cristian-ilies-vasile commented 4 years ago

Ada provide a similar technique with Pascal, however my proposal was in such a way that is not need declare a special type, because the "range" label should be linked only to integers and floats, so the type is already know.

https://learn.adacore.com/courses/intro-to-ada/chapters/strongly_typed_language.html

Integers A nice feature of Ada is that you can define your own integer types, based on the requirements of your program (i.e., the range of values that makes sense).

--  Declare a signed integer type, and give the bounds
   type My_Int is range -1 .. 20;
   --                         ^ High bound
   --                   ^ Low bound