morganstanley / hobbes

A language and an embedded JIT compiler
http://hobbes.readthedocs.io/
Apache License 2.0
1.16k stars 105 forks source link

add language extensions: SAFE and UNSAFE #372

Closed smunix closed 3 years ago

smunix commented 3 years ago

Hobbes functions such as unsafeCast are, as labelled, deeply unsafe. They escape the type-checker safety guarantees and can crash hi server if applied from a remote client to the wrong parameter type.

This change introduces support for language extensions, among which the UNSAFE pragma that can be used to flag expression variables that won't escape the scope of the server they are being run into. By doing this, they prevent remote client from invoking them, unless otherwise stated.

Also, the current fix introduces a new mode, Safe, under which the UNSAFE extension is effective. SAFE is meant to override this feature, and can only be run from the server side.

kthielen commented 3 years ago

This looks like a massive change with little added value. We're really going to start annotating every single function as "SAFE" versus "UNSAFE"? Where did this request come from? Does this serve a real need?

kthielen commented 3 years ago

Availability and robustness are required for our applications.

That's fine, if not a little vague, but why would that mean adding annotation and ceremony everywhere, rather than fixing an indexing bug or changing a type signature?

Ultimately, the simpler this thing is, the easier it is to verify its safety and the easier it is to maintain. I would be on board with personally helping to make it simpler also toward that end of making it more robust. But if we're going to go the way of ever increasing entropy, then we should just drop it now.

smunix commented 3 years ago

The alternative, changing type signatures, was found to be highly intrusive and would push costly breaking changes down to existing users. This is still under evaluation though.

Some expressions (e.g. the family of unsafeCast, newPrim functions, ...) can't be made safe due to their very "unsafe" nature, or because they're C-defined functions that are exported by Hobbes.

Also, some users write "unsafe" helper functions that are more performant than the similar ones that implement some heavy validation. These users are required to either expose the "unsafe" versions or not, or do provide both validating and not validating versions of expressions, depending on their configuration.

These are some of the cases this PR is addressing. The case of existing bugs is orthogonal to the issues above, and is being addressed separately.

kthielen commented 3 years ago

I guess these findings and user expectations aren't part of the public record, so we can't evaluate alternative approaches or consider whether new requirements are consistent with old ones. It's hard to imagine a process like that working in the long run, for an open source project at least.

I guess at some point there will be concerns beyond memory safety, and another class of "SAFE" annotations will have to be added for access to the file system, or other OS resources that might be sensitive. At some point it'll be another type system, but distinct from the regular type system, and simplifying it will be found to be highly intrusive, so it'll keep growing that way.

I guess this is the sort of situation that Guy Steele had in mind with his "growing a language" essay, and already it's too late to reverse other expedient decisions.

kthielen commented 3 years ago

I'm sorry that it's come to this, and I know that the lack of support for this project over the years has forced expedient changes to get rammed through like this, I don't blame you guys.

It's always been a fight against inertia and skepticism, even when the project had significant unambiguous improvements over existing infrastructure. I will always say that I was able to prove with hobbes that a generic structurally typed variant of Haskell can be effectively used in latency-sensitive systems as well as for storing and querying large and complex datasets in ways that traditional tick databases can't. There have been some bumps in the road, but this project has also done new things that haven't been done before. Despite the mean reversion here, eventually this model is going to win out.

Good luck. I will work on a fork or new project, and will keep folks I've met who've been following this project in the loop.