influxdata / flux

Flux is a lightweight scripting language for querying databases (like InfluxDB) and working with data. It's part of InfluxDB 1.7 and 2.0, but can be run independently of those.
https://influxdata.com
MIT License
769 stars 153 forks source link

Allow numeric literal to be safely coerced to concrete numeric types #476

Open adamperlin opened 6 years ago

adamperlin commented 6 years ago

From ifql created by nathanielc : influxdata/ifql#309

From the Spec:

Numeric literals may be integers or floating point values. Literals have arbitrary precision and will be coerced to a specific type when used.

The following coercion rules apply to numeric literals:

aanthony1243 commented 6 years ago

@nathanielc @adamperlin I don't believe this is implemented, but we can prepare a test for it anyway since it's in the spec.

nathanielc commented 5 years ago

Now that we have well defined rules for type inference, this can be less of coercion rules and more of numeric literals have a type class of Number but can be any sane numeric type.

For example 4.0 can be inferred to have any of these types, float, int, or uint. While -4.0 would only allow for float and int types.

This way number literals can be of any type and are only constrained by their usage. Additionally they can be polymorphic. For example:

x = 5 // x is polymorphic and can be any numeric type depending on each of its uses.
foo(myint: x) // x used as an int
bar(myfloat:x) // x used as a float.
nathanielc commented 5 years ago

See https://github.com/influxdata/flux/issues/220 for adding type classes to accomplish this.

aanthony1243 commented 5 years ago

@sanderson you might want to track this. current behavior of 4.0 + 3 is a type error. Looks like we'll change this eventually.

aanthony1243 commented 5 years ago

I might add a vote to this issue to consider python 3 style division, where x/y ALWAYS returns a float, regardless of whether x or y being integer or float types. for pure integer division, they have x//y. This lifts ambiguity from the expressions, and allows for easy type inference/expectations for data.

nathanielc commented 5 years ago

My thinking has been to treat numeric types as polymorphic and let type inference figure out their type based on their usage.

Flux is strongly typed, meaning that the type of a value is know and it will not change. For numeric types this means that floats will not be explicitly cast to integers or any other combination of numeric casts. What is being proposed here is that numeric constants be polymorphic, i.e. they can be any of the numeric types depending on their usage. This is similar to how Go behaves.

Some Go examples:

package main

import  "fmt"

const n = 2

func main() {
    var f float64 = 5.0
    var i int = 5

    fmt.Println(f / n)
    fmt.Println(i / n)
}

Above n is used as both an integer and a float depending on context, i.e. n is polymorphic.

If you define n as real value that cannot be an integer and then use it as an integer that is a compilation error:

package main

import (
    "fmt"
)

const n = 2.5

func main() {
    var f float64 = 5.0
    var i int = 5

    fmt.Println(f / n)
    fmt.Println(i / n) //prog.go:14:16: constant 2.5 truncated to integer
}

An example in Go that I would consider bad:

package main

import (
    "fmt"
)

const n = 5 / 2 // The constant n is now defined as the integer 2 because there is an expression and Go assumes integers for number without decimals

func main() {
    var f float64 = 5.0

    fmt.Println( n) // prints 2
    fmt.Println(f / n)  // prints 2.5
}

I am proposing something similar for Flux but where it handles the case of constant expressions better.

For example

twohalf =  5 / 2

f =  float(x: 5)
i = int(x: 5)

f / twohalf  // prints 2 as a float

i / twohalf  // prints 2 as an int

Even functions that use constant numbers will remain polymorphic until it is used.

half = (n) => n / 2 // polymorphic function for any numeric type

half(n: 2) // prints 1 as a polymorphic number

half(n : 2) * 3.5 // prints 3.5 as a float 

half(n : 2) * 3 // prints  3 as a polymorphic number

i = int(x: 3)
half(n: 2) * i // prints 3 as an integer

I am not sure about all the consequences of such a design. Do we have to perform symbolic manipulation in order to not loose precision? If so that seems like too much complexity.

Specifically this example may be very difficult to achieve.

twohalf = 5 / 2
i = int(x: 10)
i / twohalf  // prints 4 as an int

If we rewrite this expression removing the identifiers we get:

10 / ( 5 / 2)

It is clear to see that the result should be 4, but if we determine that the types of numbers in this expression are ints and then we evaluate we get:

10 / ( 5 / 2)
10 / (2)
5

This is clearly the wrong answer. So I am not sure how to handle this case. Maybe we treat all numeric constants as having infinite precision and use bigint implementation?

Polymorphic constants seems really nice as they promises to let the user ignore numeric precision for the most part. But if it comes at the cost of extreme complexity in implementation or very confusing mathematical inaccuracies since type inference its not always clear then it is not worth it. At that point using typed numerical operators like / vs // or for Ocaml .+ vs + etc could be a good solution.

Thoughts?

aanthony1243 commented 5 years ago

I think a lot of this is about perspective. what you are proposing is not totally different from what python3 does in my mind. The premise, if I'm being brief is that there is not float or integer but only numbers. it's not really that everything is a float, per se. but it just happens that it is convenient to represent most numbers with a double precision floating point value.

one shouldn't have to think "is this a float or an int?" unless it's totally critical to do so (i.e. when we want to compute a quotient and a remainder).

the place where we differ from python or Ocaml is that we are a data-first language, and there are many good reasons to distinguish between a float and an int. Specifically, we can use different computational tricks depending on the type (i.e. bit shifts to do magic stuff with integers) and also different compression algorithms may be used. So I guess our language is unique in the sense that we need to say "numbers' types don't matter....until they do"

nathanielc commented 5 years ago

Agreed this about perspective.

"numbers' types don't matter....until they do"

My hope with Flux is that sense its strongly typed the type is known always by the compiler. The user only needs to know the type if they care and the compiler will not automatically use a numeric type that will loose precision.

russorat commented 4 years ago

this is a huge source of frustration for me when using Flux.

nathanielc commented 4 years ago

Here are is how we are thinking we can implement this:

  1. Introduce a new type constraint called NumericDefaultInteger (or something like that)
  2. Introduce new NumericLitreal AST and Semantic Graph Nodes
  3. Update parser to produce NumericLiteral AST nodes instead of IntegerLiteral nodes
  4. Update type inference to handle the new NumericLiteral semantic node to report a polymorphic free type variable that is constrained by the new NumericDefaultInteger type constraint.
  5. Add a final pass over the semantic graph during semantic analysis, that replaces any type variables that are constrained with NumericDefaultInteger with the integer basic type.

This means that literals will be let-polymorphic which means this should work

add  = (a,b) => a + b

x = 1

add(a:x, b: int(x: 1)) // x is used as integer
add(a:x, b: 0.5)       // x is used as float

A benefit of this design is that the polymorphic nature of literals does not leak past the type inference step. Meaning the compiler and the interpreter do not need any special handling.

Bolladeen commented 4 years ago

DOD:

  1. Add NumericDefaultInt type constraint
  2. Update the IntegerLit semantic node to hold a monotype instead of an integer
  3. Add a final pass of the semantic graph and replace unresolved type variables constrained by NumericDefaultInt with integers
  4. Make sure all test cases pass with original behavior