Open michaeldickens opened 6 days ago
Latest commit: e836e4d721a6757809cd9b339a319498bb30041a
Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.
Click here to learn what changesets are, and how to add one.
Click here if you're a maintainer who wants to add a changeset to this PR
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Updated (UTC) |
---|---|---|---|
quri-ui | ✅ Ready (Inspect) | Visit Preview | Jun 24, 2024 8:04pm |
squiggle-components | ✅ Ready (Inspect) | Visit Preview | Jun 24, 2024 8:04pm |
squiggle-website | ✅ Ready (Inspect) | Visit Preview | Jun 24, 2024 8:04pm |
@michaeldickens is attempting to deploy a commit to the Quantified Uncertainty Team on Vercel.
A member of the Team first needs to authorize it.
Quick comment based on the overall idea, I haven't looked at the implementation carefully yet.
Thanks for doing this! This is definitely something we discussed and want to support.
My main problem with the syntax that you picked is that besides unit type checking, we'll also want to implement other type checks in the future.
For this reason I'd strongly prefer if we used something else than :
to provide the unit portion of the type.
This is easy to fix if we go with ::
instead of :
, for now:
speed :: m/s = 1
The design for actual types (that would allow to express other value types, e.g. functions or lists or whatever) is still not finalized and will probably take a while, but note that we already have annotations in function signatures.
Units are obviously not types, because:
Btw, this is tangential, but my current plan is to eventually support domains everywhere:
[5, 10]
to Number.rangeDomain(5, 10)
; units could also be extended in the similar way, x :: m/s
-> x: Number.unit("m") / Number.unit("s")
With domains, we could start with completely runtime type-checks and gradually push the boundary towards compile-time checks where it's possible; but it won't be possible in all cases.
This also means that for now I don't really care if unit type checking or any other type checking is done in the compile-time. It's fine to prototype type checking in runtime, and then optimize later (if we pay attention and avoid features that are too powerful and can't be optimized later).
Now on these questions:
- Should the type checker do type inference, or should it simply ignore any variable without a type (like TypeScript's any)? Type inference is more work but it looks doable to me. But it's not obvious that type inference is actually desirable from a design standpoint. (I lean toward yes.)
Type inference sounds very useful, and as I said above, its lack is dangerous because of backward compatibility problems.
I'm leaning towards "let's just infer in runtime", though.
How to treat numeric literals? I see two reasonable possibilities: (1) let them be any type, or (2) treat them as unitless unless directly assigned to a variable (e.g. in x: kg = 1, the 1 should be treated as having type kg, not unitless). Treating them as unitless would help catch more type errors, but it also means you'd need a syntax for type conversion. For example, if you have a distance literal and a time variable, converting them to velocity requires declaring the type of the literal, such as usainBoltSpeed: m/s = (100 as m) / raceTime.
Is my proposed syntax ok? There are ways it might be improved, for example allowing parentheses and/or exponents:
work: (kg*m^2)/(s^2) = 1
It's fine if we don't support these for now.
- Do probability distributions require any special treatment? I believe they can be treated just like numeric values.
I'm not sure. I think units syntax is fine for dists if we store types separately (x : type :: units = ...
, as I said in my previous comment), but I haven't thought about this too much.
- What syntax should function definitions use? I'm thinking it could use basically the same syntax as TypeScript: func(x: type): type = ...
- How to type arrays and dicts? Should Squiggle require that the values in an array/dict all have the same type? What should the type signature syntax be? Again, it can probably be the same as TypeScript.
I don't consider units to be full types, and doing both at the time would be too complicated; see my comment on domains above.
- It would be good for SquiggleHub to have a good way of showing types. How should squiggle-lang supply type info make that work? (SquiggleHub does some stuff with GraphQL that I don't understand but I haven't looked into it much.)
GraphQL doesn't seem relevant here; runtime inference based on tags would be very helpful.
- How to handle type errors? Should they be a subclass of ICompileError, or should they be handled separately? Does the frontend need some special way to consume them?
ICompileError
seems fine for this, can be improved later. But, again, if we do runtime inference, then it's going to be just a normal runtime error, for now.
@berekuk Based on your comments, here's my to-do list. Let me know if I should add anything.
Also: Can you explain what you mean by "domain"? And relatedly, why do you think it's too complicated to do unit types for function definitions and arrays/dicts?
Update unit type syntax to not conflict with (future) normal type syntax. Implement type inference.
I think that tracking unit info in runtime, through tags, would be easier for now, and would give us same benefits as full type inference at the cost of a slight performance overhead. But either approach is fine, and we'll want compile-time type checks and inference eventually. I just think it's more work for you.
Full type inference is especially tough, so I'll explain my priorities for it:
For clarity, in (1), I mean this:
x = {
y :: kg = 1
z :: m = y
y + z
}
This would be quite unexpected for users, the feature looks like it should work but it doesn't do anything, and it's a footgun. So I'd like this to be supported before we merge.
In (2) I mean this:
x :: kg = 1
y = x
And this:
x :: m = 1
y :: s = 2
z = x / y /* m/s? */
And in (3) I mean this:
f(a) = a
x :: kg = 1
y = f(x)
This one is the most difficult, and you definitely shouldn't attempt it now; some languages (Haskell and OCaml) are smart enough to infer that f
maps any generic 'a
to 'a
, so it would infer that y
has the same type as x
. But this would require something like full Hindley–Milner.
Or, you could get all of (1), (2) and (3) for free, and also easier access to the unit information during render, if we do runtime tracking with tags.
Also: Can you explain what you mean by "domain"?
The term "domain" is briefly mentioned in the parameter annotation documentation, but not very well explained.
The idea is that any check ("is this a valid value to use here?") can include runtime checks ("does the value belong to the [10, 20] range?") and compile-time checks ("is this a number?"). We plan to bundle both of these checks in a single "domain": a meta-value that describes both components.
Unlike in Typescript, domains are values, not a separate namespace of "type names". They can be constructed in runtime, e.g. with functions, passed around as parameters, and so on.
xDomain = Number.rangeDomain({ min: 5, max: 10 })
x: xDomain = 7
Here, xDomain
type would be, let's say, Domain<Number>
, and when you annotate x: xDomain
, Squiggle would infer that x
type would be Number
. So it would do the compile-time check that x
is a number, by checking that 7
is also a number.
In addition to that, Squiggle would also do the runtime check to make sure that 7
is in [5, 10]
range.
The currently supported f(x: [5, 10]) = ...
syntax for parameters annotation is a syntax sugar on top of domains: [5, 10]
annotation used after :
gets upgraded to the domain. This would work in the same way for variable declarations, x: [5, 10] = 7
.
We will support other annotation shortcuts in the future, because full domain constructors are too verbose. Also, common types will also become domain constructors, e.g. x: Number = 7
. Number
is a function, but the "annotation -> domain" syntax sugar layer will upgrade it to the domain, so it's the same as, let's say, Number.anyValueDomain()
.
This is the very preliminary plan for the future type system in Squiggle. It's going to be somewhat complicated to implement, because it's heavy on generics, but it seems doable. I'm not sure when we'll get around to implementing all this, tbh.
(DO NOT MERGE)
This is a demo PR to illustrate how unit type checking might work (see #3309). If Squiggle maintainers like my proposal, I can finish implementing it. I'm also fine with tabling this PR if maintainers think it would take too long to agree on a good implementation. I do think it would be really nice to have unit typing, it could prevent a lot of bugs.
This PR is only a demo, it has various missing parts and the code quality isn't consistent. See "Open questions" below.
Syntax
Squiggle unit types are declared using approximately the same syntax as TypeScript. Example:
A type signature consists of one or more identifiers separated by zero or more
*
symbols, followed by zero or more identifiers separated by zero or more/
symbols. In other words, type signatures must match the pseudo-regex\identifier(*\identifier)*(/\identifier)*
.Some more examples of type signatures:
Semantics
The parser type-checks variable definitions and expressions.
For this demo, the type checking rules are fairly simple:
x
has typekg
andy
has typeliter
,x/y
has typekg/liter
.*
or/
, the type checker enforces that the two values have the same type. For example, you can only add/subtract two variables if they have the same type (or if at least one of them does not have a declared type).Implementation details
I implemented type checking in the way that looked best to me, but Squiggle maintainers may have a better idea of how to do it.
In parse.ts:parse(), after generating the AST, I added a call to the
typeCheck()
function, which is defined in a new file typeChecker.ts.Open questions
There's definitely more that needs to be implemented to make this PR viable, but I didn't want to make too many design decisions without consulting the Squiggle team.
any
)? Type inference is more work but it looks doable to me. But it's not obvious that type inference is actually desirable from a design standpoint. (I lean toward yes.)x: kg = 1
, the1
should be treated as having typekg
, not unitless). Treating them as unitless would help catch more type errors, but it also means you'd need a syntax for type conversion. For example, if you have a distance literal and a time variable, converting them to velocity requires declaring the type of the literal, such asusainBoltSpeed: m/s = (100 as m) / raceTime
.squiggle-lang
supply type info make that work? (SquiggleHub does some stuff with GraphQL that I don't understand but I haven't looked into it much.)ICompileError
, or should they be handled separately? Does the frontend need some special way to consume them?