unisonweb / unison

A friendly programming language from the future
https://unison-lang.org
Other
5.65k stars 266 forks source link

Value.deserialize isn't type-safe #5053

Open setupminimal opened 1 month ago

setupminimal commented 1 month ago

As noted in #1728, Value.deserialize can be used to implement cast. Which might not seem so bad, except you can use it to completely subvert the type system. Here is a program which defines a term runner that doesn't have the Exception ability, but that throws an exception at runtime:

ioFun : '{IO,Exception} ()
ioFun = do
  printLine "Doing evil things ..."
  raise "And failing!" ()

asVal = value ioFun
bytes = serialize asVal

deser = Value.deserialize bytes
final : '{IO} ('())
final = do match deser with
  (Left e) -> do ()
  (Right d) -> match load d with
    (Left e) -> do ()
    (Right f) -> f

runner : '{IO} ()
runner = do !!final

I am pretty sure this applies to arbitrary other abilities as well. I'm not sure whether doing that lets you get access to caller's ability handlers that you shouldn't be able to access, or whether it causes a runtime error, but either way it's not great. The only ability that you can't 'fake' in this way is IO, because Value.deserialize requires IO.

I think Unison has a unique opportunity to actually make this safer: there's already a typechecker that can run on terms in the codebase; Value.deserialize could type-check the resulting value. Or, if that would be too expensive performance-wise, it could provide the current version as Value.deserializeUnsafe or something like that and at least let people who care about safety opt in.

For context, I am exploring writing a local persistent tasks library, and I was trying to figure out what would happen if I had a task written to require some ability, and then updated the definition of the ability and tried to deserialize and run it. And I think my answer is: probably a runtime exception, possibly even from code that doesn't have the Exception ability, with no way to programmatically guard against that. Which is ... not great.

(validateSandboxed does work to detect that the value returned from final performs IO, but if you take out the print statement it doesn't catch that it can still generate an uncaught runtime exception)

ceedubs commented 1 month ago

You are right that the current functionality is that the type is not checked upon deserialization. This can lead to runtime failures for unhandled abilities, or if the calling code runs a handler for that ability, the serialized ability gets injected into the runtime handler.

I agree that it would be nice to have at least some amount of control over this. I'm not sure how practical it would be to do full typechecking and whether there is runtime type erasure that would prevent fully doing this. @dolio would probably know best about that.

But there is a more lightweight approach that I have pondered. Like you mentioned you can use validateSandboxed [termLink toDebugText.impl] thunk to enforce that toDebugText.impl is the only tracked builtin that is allowed in thunk. It might be nice to have a similar function that is something like validateAbilities [typeLink Remote] thunk that enforces that Remote is the only unhandled ability in the thunk. But I imagine that this would be more complicated than validateSandboxed, because I don't think that you'd want to just search the AST for references to an ability; I think that you'd only want to make sure that the ability doesn't propagate to the top-level, where it might be handled by the calling code. I imagine that's tricky, but again Dan is the one who would know.

dolio commented 1 month ago

There is essentially no infrastructure for the original request.

  1. The runtime doesn't know what type the result it supposed to be.
  2. The runtime representation is different from the representation that actually gets type checked.
  3. The runtime doesn't have access to a codebase (either direct, or possibly at all).

Cody's more lightweight idea might be possible. It's not quite as easy as sandboxing, though, because the control flow of abilities makes the check more complicated. I think there is probably still enough information to check what the deserialized code itself will do, though (not necessarily what would happen based on values that gets passed in).

pchiusano commented 1 month ago

Yeah, Value.deserialize is a low-level unsafe operation and should be kept as is. If you would like to construct a type witness of some sort and serialize that along with the value, and check it on deserialization, you could do that today with pure Unison code.

If Unison had typeclasses then constructing these witnesses could be made more convenient, similar to Typeable in Haskell. But it isn't necessary if you just want to encode and check the type info at runtime.

@dolio for the ability sandboxing, could you get away with a builtin

disallowEffects : '{} a -> Either Link.Type a

This would behave like disallowEffects a = Right !a but it "catches" all unhandled ability requests and converts them to Left. I'd guess that the primary usage is to make sure all effects have been handled and the remaining computation is either pure or uses only sandbox-allowed IO operations.

ceedubs commented 1 month ago

@pchiusano I think it would be much more useful if you could provide an allow list of abilities. Like in Unison Cloud we want functions to be able to have unhandled Remote but we don't want them to end up using our Postgres handler or whatever.