Modularity - Githubissues

keean / zenscript

A trait based language that compiles to JavaScript

MIT License

42 stars 7 forks source link

Modularity #39

Open shelby3 opened 6 years ago

shelby3 commented 6 years ago

Prior discussion:

https://github.com/keean/zenscript/issues/14#issuecomment-308722477

https://github.com/keean/zenscript/issues/33#issuecomment-362720879

https://github.com/keean/zenscript/issues/8#issuecomment-362778102

Please kindly move modularity discussion here and not pollute issue threads which are specialized on their thread title (speaking to myself also, not to any specific person), so that we can have modularity discussion organized in a thread of its own.

sighoya commented 6 years ago

@shelby3 Associated types are fine and they constrain types on type level not on runtime level, right? (example: type family in typeclasses, pokemon example)

@keean I will try to recapture your problem with a code example (though, I'm not really familiar with ocaml), you can correct me, if I'm wrong:

signature EQ = sig
type t
val eq : t * t -> bool
end

structure EqInt = struct
type t = int
val eq = Int.eq
end

fun f (X : EQ) = struct
type t = X.t
val eq = X.eq
end

Function f should have a signature of the form EQ where type t -> EQ where type t which can be erased even if the type t from X is not known explicitly. Why? Every time you construct a signature instance EQ with certain type t, the compiler will/could create a monomorphized function for it. When a module of type EQ is applied to f, the compiler should decide to pick the right function at compile time, which should be possible because of type inference. Even if you change X.t from int to float (don't know if possible), you state the type for the type change of X.t explicitly in your code, which should be recognized by the compiler. The only problem that arises is if you eval a type out of a string which is created from "outside":

string<-readline...
X.t=eval_to_type(string)

Because of this, most statically typed languages disallow such things.

keean commented 6 years ago

@sighoya it would help if you referenced the problem you are questioning, I am not sure what you are trying to show or disprove?

sighoya commented 6 years ago

I want to prove that dependent types are not needed for modules.

Edit: reference

keean commented 6 years ago

You example has no runtime polymorphism. You need to introduce an existential type, so that you want to run Eq over a list of existential values.

sighoya commented 6 years ago

You mean something like this:

signature EQ = forall x sig
type t=[x]
val eq : t * t -> bool
end

signature EQ = sig
type t=[forall x]
val eq : t * t -> bool
end

I don't know if the latter is a valid syntax, but I mean a heterogenous list, i.e. contain values of different types. In the case of the latter example, fun f with:

fun f (X : EQ) = struct
type t = X.t
val eq = X.eq
end

should have signature of EQ where t=[forall x] ->EQ where t=[forall x],

meaning that t is a list where the elements are of type: Variant Type(runtime Union Type) for which no type erasure and no dependent type is needed.

keean commented 6 years ago

@sighoya

meaning that t is a list where the elements are of type: Variant Type(runtime Union Type) for which no type erasure and no dependent type is needed.

That is different because with a union type you are limited in which types you can put in. Given a "String|Int" you can only put a string or integer into the heterogeneous collection.

With an existential type we can specify the interface, say forall a . Eq a => a which allows any type that implements Eq to be put in the collection.

I can actually demonstrate the unsoundness without using an existential type, just think about polymorphic recursion. You cannot monomorphise a recursive polymorphic function.

keean commented 6 years ago

@sighoya

should have signature of EQ where t=[forall x] ->EQ where t=[forall x],

This type is impossible because the type escapes the existential.

sighoya commented 6 years ago

@keean wrote

That is different because with a union type you are limited in which types you can put in. Given a "String|Int" you can only put a string or integer into the heterogeneous collection.

Thats right, therefore Variant Type != Union Type @keean wrote

With an existential type we can specify the interface, say forall a . Eq a => a which allows any type that implements Eq to be put in the collection.

Aha, here an existential takes the role of a trait/typeclass object. @keean wrote

I can actually demonstrate the unsoundness without using an existential type, just think about polymorphic recursion. You cannot monomorphise a recursive polymorphic function.

Corner cases/Undecidable issues Do you think the following is sound?: (+) :: Int->Int->Int Int.MAX+Int.MAX=Error?

@keean wrote

@sighoya

should have signature of EQ where t=[forall x] ->EQ where t=[forall x],

This type is impossible because the type escapes the existential.

Yes it breaks the rules of existentials. Maybe: f::EQ where t=Variant -> EQ where t=Variant is a better fit or left out the associated type with: f::EQ->EQ Why is it so important to include the associated type in the type signature?

Further, I don't see the need for dependent types for module purposes but maybe you mean runtime polymorphism is inevitable, then yes. Runtime polymorphism is indeed very useful.

shelby3 commented 6 years ago

@keean

If a module has an associated type, and a module is first class (can be passed to a function as a value) then the associated type must depend on the value passed to the function. This is clearly the definition of a dependent type.

But if the instance of the module’s type is statically known at the time when the value is applied at the call-site then it’s no different than applying typeclasses at the call-site. So are typeclass bounds dependent typing?

I guess it’s because we erase the instance’s type from the static knowledge when we assign it to a module signature (but have to reify it with runtime dispatch), thus we don’t track it statically? Isn’t that the analogous to assigning an object to an existential type thus erasing it from static tracking?

The key difference is when the type of the value can’t be known statically, then we need to track the type of a value at runtime with some form of runtime dispatch. Which is analogous to existential types with typeclasses.

What am I missing? Trying to understand why you see a problem with this form of dependent typing that you don’t see with typeclasses?

I can actually demonstrate the unsoundness without using an existential type, just think about polymorphic recursion. You cannot monomorphise a recursive polymorphic function.

Ditto cannot monomorphise typeclass bounds with polymorphic recursion. Have to use runtime dispatch. We had a discussion about that in the past on these issue threads.

@sighoya wrote:

Further, I don't see the need for dependent types for module purposes but maybe you mean runtime polymorphism is inevitable, then yes.

Well I think he may be defining dependent type correctly, yet I agree that seems like it’s just runtime polymorphism. And I am also wondering why he dislikes it given it seems to occur because of the inevitable need for runtime polymorphism.

I thought dependent typing was when we can check the values of a value statically?? With runtime polymorphism we’re not enforcing the type of the value statically. There’s no monomorphisation going on.

keean commented 6 years ago

Ditto typeclass bounds with polymorphic recursion.

No typeclasses only depend on the type of the thing, not the value of the thing.

Now you would have the same problem if you had type-classes with associated types, but in Haskell they get around it by saying the associated types must be globally coherent with the type parameters of the type class instances. This makes the type-class type-parameters, and the associated types of that type-class form a type-family.

keean commented 6 years ago

@sighoya a variant type is also limited in the types you can put into it (at least it is in OCaml). Without existentials (runtime polymorphism) polymorphic recursion can still cause problems with modules that requires dependent types to support.

Now you can have modules without dependent types, if you limit the language to having everything monomorphisable. This means no runtime polymorphism, no polymorphic recursion etc.

sighoya commented 6 years ago

@keean wrote

@sighoya a variant type is also limited in the types you can put into it (at least it is in OCaml).

My thoughts were onto dlang's Variant Type. You can also further restrict dlangs Variant Type to allow assignments of values of specific types (they call them Algebraic Type, in the same url I presented). A variant type is a any type without the sh?t of inheritance.

Edit: A typeclass/trait object can also be seen as a Variant Type which is bounded to the types implementing the trait/typeclass. In some situations, a trait object can also be a Union Type (compile time Variant Type), in which case monomorphization can applies.

@shelby3

I thought dependent typing was when we can check the values of a value statically?? With runtime polymorphism we’re not enforcing the type of the value statically. There’s no monomorphisation going on.

I thought depending typing was when we cannot check the type of a variable at compile time (statically), because the variable's type depends on the variable's value and maybe of other variables's values. However we can pattern match over it and can follow different branches in which the compiler knows the type for each branch (when done rightly), though it can become tedious.

Dependent types are indeed very useful. Imagine a type called DeterminantNotZeroMatrix, which is defined at follows:

DeterminantNotZeroMatrix<Type eltype,Int n,Int m>={Matrix<eltype,m,n> mat |  det(mat)!=0}

Edit: ambigious use of m, changed it to mat.

Once constructed this kind of dependent type, we can invert this Matrix without to check every time if the matrix is invertible.

But @keean, I think you need a runtime polymorphic time rather than dependent typing

sighoya commented 6 years ago

There is a proposal to add type classes to C#. The funny thing is that the proposed implementation of typeclasses looks like a lot of a light form of Modular Type Classes

keean commented 6 years ago

@sighoya

My thoughts were onto dlang's Variant Type. You can also further restrict dlangs Variant Type to allow assignments of values of specific types (they call them Algebraic Type, in the same url I presented). A variant type is a any type without the sh?t of inheritance.

This kind of variant is not modular, as there is no way to represent a universal type tag. This kind of variant requires Run Time Type Information, which is a big problem for me, as you lose type erasure (and theorems for free) as well. I can give more details why this is bad (or perhaps why I don't like it if I am being generous).

But @keean, I think you need a runtime polymorphic time rather than dependent typing

Runtime type information is worse (see above) with dependent typing we can still resolve issues at compile time. For example if we try and print a value of an associated type, we can add the "Print" constraint to that module. This can be propagated up to the interface specification for the current module such that we say the argument for this function must be an iterator where the ValueType of the iterator is printable. These are dependent types (because the type passed to print depends on the value of the module passed to the function) but we solve by placing constraints on the types which we propagate at compile time. The latest we would need to check these constraints is when linking a dynamically loaded module, we still retain type erasure and don't have to find a way to store types in memory at runtime.

I think its bad to store types in memory at runtime due to the Russel paradox. If every memory location has a type, and a memory location contains a type then the type of a type is a type and we have created the conditions for a paradox. Everything in memory at runtime is a value, now we can create a mapping from values to types, but we cannot do this globally, because with separate compilation we don't know what values other compilers assign to which types. So we have to locally map values to types which is what an OCaml variant does (a disjoint sum type).

sighoya commented 6 years ago

@keean wrote

Runtime type information is worse (see above) with dependent typing we can still resolve issues at compile time. For example if we try and print a value of an associated type, we can add the "Print" constraint to that module.

Looks like for me you need a trait/typeclas object

@keean wrote

I think its bad to store types in memory at runtime due to the Russel paradox. If every memory location has a type, and a memory location contains a type then the type of a type is a type and we have created the conditions for a paradox.

The type Type is a meta type which is a non well founded set including itself

keean commented 6 years ago

@sighoya it seems you agree with my reasoning, but trait objects are Rusts name for existential types, and you are back to needing dependent types if you use modules.

It all comes down to the difference between a type class and a module. In a type class the implementation is chosen only using the type information of the type class parameters. In the lambda cube this makes an associated type a type that depends on another type. Because a module is first-class the associated type depends on the actual value of the module passed, not just its type, hence an associated type is a type that depends on a value (otherwise known as a dependent type).

sighoya commented 6 years ago

@keean Let me try to reflect. You describe a case where a functor is taking a module and returning a module where the associated type of the returned module depends on the taking module. You don't want to solve it with Variant Types because you don't like them, but you want to solve it with Existentials which have a constraint which depends, however, on the value of the taking module. Your solution is to constrain the taking module by a dependent type, i.e. dependent signature in which only the desired modules are accepted by the functor.

In my eyes, it is one solution. But you can also force the functor to return a Maybe structure in which its contained/wrapped structure depends on the value condition.

keean commented 6 years ago

@sighoya not quite. With first class modules (and Functor) you can pass modules/functors as arguments to functions. Because associated types with modules depend on the value of a module and not just its type, you need dependent types even without variants, existentials etc. You just need first-class modules and associated types to have the problem.

keean commented 6 years ago

@sighoya Let me have another go, as I don't think I have done a very good job of explaining, and may have got a bit side-tracked by other issues. The type system only tracks the type of modules or functors. It is possible for two modules which conform to the same signature (type) to have different associated types. The associated types are not part of the signature.

signature EQ = sig
type t
val eq : t * t -> bool
end

structure EqInt = struct
type t = int
val eq = Int.eq
end

structure EqFloat = struct
type t = float
val eq = Float.eq
end

fun f (X : EQ) = struct
type t = X.t
val eq = X.eq
end

f(EqInt).eq(1, 2)
f(EqFloat).eq(1.2, 1.1)

Because EqInt and EqFloat are values of the type EQ the type system does not track the difference between them. All the type system sees is type EQ in order to see what value is passed to f we have to use partial-evaluation or abstract-interpretation. However nether of these methods can solve general computational problems, and you very quickly end up running the whole program. This is why compile-time evaluation is restricted to constant-expressions in C++ and other languages. In other words we could resolve this for limited cases, but most compilers don't even try.

sighoya commented 6 years ago

@keean wrote:

The type system only tracks the type of modules or functors. It is possible for two modules which conform to the same signature (type) to have different associated types. The associated types are not part of the signature.

Right, This is the intention of associated types, to not include these in the type signature, otherwise, they are parameters.

@keean wrote:

Because EqInt and EqFloat are values of the type EQ the type system does not track the difference between them. All the type system sees is type EQ in order to see what value is passed to f we have to use partial-evaluation or abstract-interpretation.

Ah, I see your problem. The problem is that the signature of eq changes in dependence to the input module. And this is unmodular, right? You can pattern match over it (=partial evaluation?), but cannot grasp all possible values with it (at least not in all cases) Edit: May the type system does not know the difference between EqInt and EqFloat, ~~but the compiler knows it because you hard coded the type int to EqInt and the type float to EqFloat and the compiler can therefore typecheck the class of eq at compile time~~ (For hidden/already compiled modules, there must be a way for the compiler to access the associated type, at least I would implemented it to do so)

I don't think the problem is because that modules are values. You even have the problem if the modules are defined statically but the definition of the modules are hidden from the user.

For example, you specialize a function f with some unknown static module EqA and call f with values depending on EqA.

f<EqA>(1,2) or f<EqA>("1","2") ?

I think it is not a problem that modules are first class values, the problem is that associated types are unmodular.

sighoya commented 6 years ago

Ah.... I must correct, your are right:

a=io.read().asInt()
module m::EQ
if a=2 m =EqInt else m = EqFloat
f(m)(1,2) #allowable?

sighoya commented 6 years ago

I could imagine to disallow:

a=io.read().asInt()
module m::EQ
if a=2 m =EqInt else m = EqFloat
f(m)(1,2) #allowable?

by the compiler and to pattern match over all possible outcomes, but the user cannot reason why the above code is not valid (because it is not inferable from the type signature), damnit!

keean commented 6 years ago

@sighoya We can only reason about code using the types, otherwise we are executing the code. If we execute the code we have an interpreter not a compiler. The whole point of a compiler is to translate code from one language (the source language) to another (the target language). This gets fuzzy when we can partially evaluate code where the inputs are static, but as soon as the input depends on IO we cannot even do that. So when you write short code examples it's easy to say "well the compiler knows this value", but in real code it es exceptionally rare, except for constant values like PI. Even with constant values you have the catch22 of not wanting to execute code until it has been type checked to make sure it does not crash the compiler, but your type checking depends on the values which require you to run the code. So in reality partial evaluation does not work for this kind of problem. That leaves abstract interpretation as the only method, and this is not a well known technique even amongst compiler writers. Dependent types are by far the best solution, or avoid the problem by forcing global coherence on module instances by making the associated types a type-family over the module type parameters ( what Haskell does).

sighoya commented 6 years ago

Some Questions:

1.) How does Ocaml handle these things? Does it crash?

2.) What is if we force the user to pattern match over the associated type in a module of type EQ:

module m::EQ
m= EqInt if a=2 else EqFloat# now, the compiler knows that m.t could be different
f(m).eq(2,2) # Here the compiler knows eq must be of type Int->Int->Bool, so the user have to check if m.t is Int

The compiler will throw an error and forces the user to rewrite the code above to:

module m::EQ
m= EqInt if a=2 else EqFloat
match m.t=
Int -> f(m).eq(2,2)
_ -> Error/Nothing

What I'm missing?

3.) What is your plan to this problem?

shelby3 commented 6 years ago

@keean,

Ditto typeclass bounds with polymorphic recursion.

No typeclasses only depend on the type of the thing, not the value of the thing.

We discussed this already in the past in our discussion of rankN HRT. When we store callbacks that have a typeclass bound as values, we lose the definition of the function body. Thus there’s no way to monomorphise the body because we don’t know the body of the function that is the callback because more than one function definition can match the same set of typeclass bounds or alternatively the caller of the callback may be supplying existential types instead of statically known instances. We have to resort to runtime dispatch. So indeed typeclasses do depend on the value of the thing when we have to throw away static knowledge of which values correspond to which types in our imperative soup (also for example existential types aka trait objects). If we can eliminate the need for imperative soup…

Now you can have modules without dependent types, if you limit the language to having everything monomorphisable. This means no runtime polymorphism, no polymorphic recursion etc.

…if our modules are applicative/pure and all imperativity occurs top-down outside any modules, then in theory we can have polymorphic recursion without requiring runtime polymorphism in the modules, although at the junction between the imperative layer and the modular layer, must be runtime dispatch to monomorphised branches of the modular code.

Btw, the Pokemon example employing type family fails to solve the Expression Problem. New Pokemon types can’t be added independently. The code could be reorganized to not use associated types if runtime polymorphism is allowed and then it would also be more modular and extensible. So again I ask @keean what the objection to dependent typing is, if all it means is to enable runtime polymorphism? Runtime polymorphism is necessary for modularity, solving the Expression Problem, and because static typing is limited in expression?

@sighoya wrote:

Associated types are fine and they constrain types on type level not on runtime level, right? (example: type family in typeclasses, pokemon example)

Well even when simulated with modular abstract types, I don’t see why they couldn’t be monomorphized in the cases where there’s no runtime polymorphism?

I thought dependent typing was when we can check the values of a value statically?? With runtime polymorphism we’re not enforcing the type of the value statically. There’s no monomorphisation going on.

I thought depending typing was when we cannot check the type of a variable at compile time (statically), because the variable's type depends on the variable's value and maybe of other variables's values.

So we agree that dependent typing is the problem where we need to know the type of values. I’m saying that I thought dependent typing was the ability to type those values statically with total order knowledge about all possible states of the program. You’re saying that that actually it’s the ability to declare some invariants independent of the values, but isn’t that what a type is? An int is an invariant independent of the values it takes.

Dependent types are indeed very useful. Imagine a type called DeterminantNotZeroMatrix, which is defined at follows:

So what you’re saying is the user can create new types and enforce semantics on those types. But isn’t that what typing is for?

It seems to me that runtime polymorphism is somehow being conflated with dependent typing in our discussion?

@keean wrote:

This kind of variant requires Run Time Type Information, which is a big problem for me, as you lose type erasure (and theorems for free) as well. I can give more details why this is bad (or perhaps why I don't like it if I am being generous).

RTTI is bad because the programmer can access it and use it as a hack for polymorphism (c.f. the mess that is the Go libraries). Whereas what we really need is static/dynamic polymorphism and runtime only where it must be.

But are you conflating runtime polymorphism with RTTI? Existential types (aka Rust’s trait objects) don’t erase the runtime information about types, because it’s required for runtime dispatch. Yet the programmer has no access to the RTTI.

But @keean, I think you need a runtime polymorphic ~time~[type] rather than dependent typing

Runtime type information is worse (see above) with dependent typing we can still resolve issues at compile time. For example if we try and print a value of an associated type, we can add the "Print" constraint to that module. This can be propagated up to the interface specification for the current module such that we say the argument for this function must be an iterator where the ValueType of the iterator is printable. These are dependent types (because the type passed to print depends on the value of the module passed to the function) but we solve by placing constraints on the types which we propagate at compile time. The latest we would need to check these constraints is when linking a dynamically loaded module, we still retain type erasure and don't have to find a way to store types in memory at runtime.

That presumes that modules aren’t first-class values:

You just need first-class modules and associated types to have the problem.

If they are values then you can’t know until runtime what the types of the abstract types are. So then you must use runtime dispatch.

I think its bad to store types in memory at runtime due to the Russel paradox. If every memory location has a type, and a memory location contains a type then the type of a type is a type and we have created the conditions for a paradox. Everything in memory at runtime is a value, now we can create a mapping from values to types, but we cannot do this globally, because with separate compilation we don't know what values other compilers assign to which types. So we have to locally map values to types which is what an OCaml variant does (a disjoint sum type).

We discussed this before in the past. Use a cryptographic hash to have decentralized creation of a universal tag.

Again I think you’re conflating access to RTTI as evil with type erasure as good. There’s a middle ground, which is RTTI for runtime dispatch is not evil (when the programmer can’t access that RTTI).

As for Russell’s paradox, it’s omnipresent in many different facets of programming and there’s absolutely nothing you can do to solve that. If you could solve it, then our universe would be provably bounded. Life is necessarily imperfect else we couldn’t exist.

@sighoya wrote:

The type Type is a meta type which is a non well founded set including itself

This is related to why we can’t have unbounded/open ADTs and use them to select implementation of typeclass instances, because then we can’t prove anything is ever disjoint. Monomorphism and the Expression Problem are in tension with each other, abstractly because of Russell’s paradox.

@keean wrote:

It is possible for two modules which conform to the same signature (type) to have different ~~associated~~[abstract] types. The ~~associated~~[instances of the abstract] types are not part of the signature.

ftfy. Associated types are a typeclass concept. Abstract types are a module concept. Afaik, the paper Modular Type Classes employs abstract types to emulate the associated types of typeclasses.

Because EqInt and EqFloat are values of the type EQ the type system does not track the difference between them. All the type system sees is type EQ in order to see what value is passed to f we have to use partial-evaluation or abstract-interpretation.

It seems you argued above that the type system can track the differences and monomorphise (in some cases) and I agreed except I pointed out that when the modules are first-class values that get passed around, then the instances of the signature can’t be known statically.

However nether of these methods can solve general computational problems, and you very quickly end up running the whole program.

What does that sentence mean? Do you mean that not everything can be monomorphised?

This is why compile-time evaluation is restricted to constant-expressions in C++ and other languages. In other words we could resolve this for limited cases, but most compilers don't even try.

Typeclass bounds at call sites also throw away the information about the types at the call site, but the difference is the static information is thrown away at the call site and not at the construction of module instance with a functor. In the case of typeclass bounds of callback functions, the body of the function is statically unknowable.

Since we know we can emulate typeclasses with modules per Modular Type Classes, we know that not all use of modules can’t be monomorphised. The key is really referential transparency. Because when we/compiler knows the code isn’t overwriting (generic references to) values with different types (which causes the requirement for the value restriction!), then the compiler can statically trace the type of the instance of a module to it’s use in a function and thus monomorphise. That is why C++ and other non-pure languages can’t do it.

All of us discussing here know that PL design has different axes that are highly interrelated. We must analyse holistically over all axes simultaneously in order to explore the design space.

We can only reason about code using the types, otherwise we are executing the code. If we execute the code we have an interpreter not a compiler. The whole point of a compiler is to translate code from one language (the source language) to another (the target language). This gets fuzzy when we can partially evaluate code where the inputs are static, but as soon as the input depends on IO we cannot even do that. So when you write short code examples it's easy to say "well the compiler knows this value", but in real code it es exceptionally rare, except for constant values like PI.

The key point of your comment is that when the code is no longer pure, then the compiler (nor the human) can reason about it.

Dependent types are by far the best solution, or avoid the problem by forcing global coherence on module instances by making the associated types a type-family over the module type parameters ( what Haskell does).

I wish we’d write abstract types instead of dependent types. Dependent types seems to be a more broad term. Yeah modules need abstract types as opposed to type parameters (because per prior discussion they otherwise lose encapsulation and can’t be non-explicitly/boilerplate composed in partial orders, i.e. must always know all the instance types when composing), and if you don’t have referential transparency, then probably cannot monomorphise them.

But referential transparency really is about top-down dependency injection. So in some cases we can top-down inject the types and monomorphise but in other cases the top-down imperative layer will lose static tracking of the types and thus we must punt to runtime polymorphism.

Per the link above, what I’m not clear on now is whether we really need both modules/typeclasses and algebraic effects? Seems like maybe all modular genericity can be modeled as bind points on the free monad?

I would appreciate you guys checking my logic on this.

@sighoya wrote:

The problem is that the signature of eq changes in dependence to the input module. And this is unmodular, right?

No that is modularity. Multiple implementations of the same signature. (Also multiple signatures for the same implementation).

sighoya commented 6 years ago

@shelby3 wrote

You’re saying that that actually it’s the ability to declare some invariants independent of the values, but isn’t that what a type is? An int is an invariant independent of the values it takes.

@shelby3 wrote

So what you’re saying is the user can create new types and enforce semantics on those types. But isn’t that what typing is for?

It seems to me that runtime polymorphism is somehow being conflated with dependent typing in our discussion?

Valid point and I'am think you are right. What I presented was a refinement type which correlates with dependent types but are not the same.

After further investigations, Languages which support Variant Types already partially implements dependent types because dependent types are function types where the Return Type is dependent on the values of the function parameters, i.e. the Return Type is a Variant Type in case of nondeterminism and a Union Type in case of determinism. The described dependent type above is a dependent product type, i.e. the free parameter value tuple determines the Return Type. However, you can also "chain" the dependent type generation meaning the parameter value tuple of a function is not free in itself, i.e. some values depend on other values in the parameter value tuple. Such a function is called dependent sum type, e.g. function ():a->a+1->(a,a+1).

The question is, why not simply use the Variant Type/Union Type or a Sum Type (Maybe, Either) to represent Varieties? With dependent types you can encode value semantics in the type signature, e.g.:

concat :: Vec a n -> Vec a m -> Vec a (m+n)

. This is so much expressive that such constructs are permitted:

fun::Vec a (x**2-y**4+z**4-z)-> (Vec a x, Vec a y, Vec a z)

Because the length of a Vector is integral, you can construct diophantine problems which are undecidable ~~(though, for a computer nothing is undecidable because he can only see a finite subset of infinities).~~ Edit: (Corrected)

@keean Again, I see not clearly why you need dependent types instead of simple runtime polymorphism (Sum Type, Variant Type)

@shelby3 wrote

Btw, the Pokemon example employing type family fails to solve the Expression Problem. New Pokemon types can’t be added independently.

The problem with the pokemon example is that Haskell is missing Union Type with the mechanism to typecheck the value and to monomorphize over it. They do this manually in this example which is bad practice, especially when the associated type Winner is of type meaning I could instantiated the -Type not only with the pokemon or foe type, but also with any other type.

@shelby3 wrote

So again I ask @keean what the objection to dependent typing is, if all it means is to enable runtime polymorphism? Runtime polymorphism is necessary for modularity and solving the Expression Problem?

Totally Agree.

@sighoya wrote:

The problem is that the signature of eq changes in dependence to the input module. And this is unmodular, right?

@shelby3 wrote

No that is modularity. Multiple implementations of the same signature. (Also multiple signatures for the same implementation).

Hmm... Yes. I've formulated it badly. What I meant is that the user cannot see how the signature changes at design time and compile time (at least not always). Therefore I presented a possible solution in my previous post which should overcome this problem, at least in my mind.

keean commented 6 years ago

So again I ask @keean what the objection to dependent typing is, if all it means is to enable runtime polymorphism? Runtime polymorphism is necessary for modularity and solving the Expression Problem?

Dependent typing breaks some desirable type system properties like decidability. If the type system becomes a fully usable turing-complete programming language, then why have two different turing complete programming languages. However it might be less problematic than the alternatives.

keean commented 6 years ago

@sighoya

Again, I see not clearly why you need dependent types instead of simple runtime polymorphism (Sum Type, Variant Type)

This is a complex question to answer, but it is along the lines of "I cannot clearly see why everyone cannot program in Lisp, as it is turing complete and you can express any program in it."

NodixBlockchain commented 6 years ago

There is an existential question i'm always wondering along those lines.

Like let say a module implement a factory, who will create a complex object and sub objects based on a declaration.

Like let say a factory for an UI dialog, where controls can have different implementations ( eg text field vs password fields), but will be manipulated via a statically defined interface by the client program.

If the UI definition passed to the factory is statically defined in the client program, does that make the program statically or dynamically typed ?

The parameters to the factory in the case would not be exactly what can be called a type signature, so the compiler can't really figure which class is being instantiated, even if it know the interface / base class. Well in the absolute it could figure it out, but most likely it won't, even if the type of the classes are still statically determined given a static input to the factory.

And it would be virtually impossible to exactly type the whole UI dialog class, given it can contain objects of many different types as members.

In case where the modules instances are clearly based on dependant type system, it's even easier than this.

shelby3 commented 6 years ago

I wrote:

Since we know we can emulate typeclasses with modules per Modular Type Classes, we know that not all use of modules can’t be monomorphised. The key is really referential transparency. Because when we/compiler knows the code isn’t overwriting (generic references to) values with different types (which causes the requirement for the value restriction!), then the compiler can statically trace the type of the instance of a module to it’s use in a function and thus monomorphise. That is why C++ and other non-pure languages can’t do it.

[…] and if you don’t have referential transparency, then probably cannot monomorphise them.

But referential transparency really is about top-down dependency injection. So in some cases we can top-down inject the types and monomorphise but in other cases the top-down imperative layer will lose static tracking of the types and thus we must punt to runtime polymorphism.

Afaics (or I posit that), with pure composition can unify a single instance type with a monomorphisation of each use of the polymorphic signature type, because there’s no loops and other referential opaqueness which otherwise prevents reasoning about which instance type is in use at each use of the polymorphic signature type. IOW, referential transparency facilitates more monomorphisation.

@sighoya wrote:

[…] dependent types are function types where the Return Type is dependent on the values of the function parameters

That is one use case as I had shown in 2012 (where partial application returns a function with the dependent type).

Yet I also showed there that the distinction is really about the fact that an abstract type is an output type and it can otherwise be achieved with the family polymorphism of type parameters which are input types, as @keean has also pointed out.

Thus the distinction between dependent types and parametric polymorphism is that the former is a (bottom-up, coinductive) output type of values and the later is a (top-down) input type of static inductive code structure. However, afaics with referential transparency they’re both monomorphisable, because former can be unified as aforementioned. Without referential transparency, the first-order occurrences of the latter can still be monomorphised, because the type parameters are propagated statically through the inductive code structure regardless.

Again, I see not clearly why you need dependent types instead of simple runtime polymorphism (Sum Type, Variant Type)

I think there’s a conflation of concepts in our discussion. So let’s try to clarify now.

An abstract type contained with a modules/records is a means of polymorphism. Because of the polymorphism applies to values, we classify it as dependent typing. Afaics (or let’s say I posit that), whether it’s less monomorphisable than parametric or adhoc polymorphism depends on whether the code is referentially transparent.

Polymorphmic variants (aka row polymorphism ← click for our past discussions; and distinguish OCaml’s variants from polymorphic variants) also lose monomorphisation analogously to abstract types, but the underlying type that represents the ignored rows is not involved in the algorithm where the polymorphic variant is employed (because by definition the differing rows of the record for the variants are ignored). Thus unlike the arg1key.ValueType I showed in my 2012 example, there’s no dependence on knowing the actual types of the variants (except maybe for the extra generic row variable representing the rest of the rows?). With abstract types, the algorithm of the entire module depends on each instance type of the abstract type.

Apparently row polymorphism (and structural typing) have a performance disadvantage compared to ML modules (presuming can’t be monomorphised) in that require a hash of the row methods instead of a vtable pointer indirection, because each instance can have the rows in different orders or slices of the object. Whereas, for modules, the entire module is vtable pointer indirection for the dictionary of the module’s static interface, although that might not be true when the signature that the functor implements is a partial capture of the instance’s rows although perhaps the functor can specialize to a unified vtable format at the time of the instance construction? Type inference apparently (c.f. also) fails generally for both (i.e. no principal types), but at least annotating ML functors grants HKT but the HKT are non-unified and boilerplate mess (although I’m hoping to avoid the need for HKT entirely).

Here is a concise summary of different kinds of polymorphism. Afaics, modules, row polymorphism, and subtyping are subsuming— the generic type can have less members/rows than the instance. Parametric and adhoc polymorphism are not.

@keean wrote:

Dependent typing breaks some desirable type system properties like decidability. If the type system becomes a fully usable turing-complete programming language, then why have two different turing complete programming languages. However it might be less problematic than the alternatives.

Oic. Well apparently same issue for polymorphic variants aka row polymorphism.

What form of statically open (unbounded) runtime polymorphism doesn’t break decidability? Existential types (aka Rust’s trait objects)?

For statically bounded runtime polymorphism, note the new research on algebraic subtyping with all open possible unions in the ground types doesn’t break decidability and would probably work if we don’t need HKT.

shelby3 commented 6 years ago

Is “fine-grained” modularity an undesirable goal?

In general, attempting to modularize n deeply intertwined abstractions and data types will require 2^n - n additional libraries to cleanly represent instances and combinators that depend on overlapping functionality.

If we have 10 modules, for example, that means more than 1,000 libraries! Modern development stacks, all the way from Github to Scala to the JVM, were never designed for such fine-grained dependencies. Ignoring this fact imposes astronomical costs on development that far outweigh the perceived benefits.

Not all of these perceived benefits are actually benefits, either. For example, users never want a recent version of one part of the FP standard library (e.g. optics), but an older version of another part of the FP standard library (e.g. ct). Because of the interdependencies, such cherry-picking doesn’t make sense.

Modules can actually slow down development and releases, and make the creation of globally consistent version sets very difficult and rare. With 1,000 modules all independently versioned and released, how difficult would it be to find a set of versions that all work with each other? You probably know the answer to that one from experience!

sighoya commented 6 years ago

@shelby3 Modularity is not the only paradigm you should go. Note: modularity and locality are two paradigms which are mostly contrary to each other.

sighoya commented 6 years ago

@NodixBlockchain wrote

There is an existential question i'm always wondering along those lines.

Like let say a module implement a factory, who will create a complex object and sub objects based on a declaration.

Like let say a factory for an UI dialog, where controls can have different implementations ( eg text field vs password fields), but will be manipulated via a statically defined interface by the client program.

If the UI definition passed to the factory is statically defined in the client program, does that make the program statically or dynamically typed ?

A statically (dynamically) typed language remains a statically (dynamically) typed language. But I would assume you mean if it becomes dependently typed? Well, it depends on the type signature of the factory interface (as far as I can tell). If you see an associated type as Return Type, then it depends on whether the associated type changes in dependence to the passed module value(s).

Maybe you can present an code example for better clarification.

NodixBlockchain commented 6 years ago

An example is like MFC.

There is a static ressource embeded into the dll describing the UI dialog.

It's possible to supply a custom class that the framework will use in the dialog.

So from the point of view of the client program, the type of the class is known at compile time.

But the whole code to create the instances is based on dynamic typing.

The type returned will be a base class / interface, not the "real type". But the class can be safely casted to the inheriting class, as its type is specified in the static ressource file.

From the point of view of the app, It's exactly like if the dialog class would be created using static type, but internally it use lot of dynamically typed code to create the good instances.

Le 18 févr. 2018 12:24, "sighoya" notifications@github.com a écrit :

@NodixBlockchain https://github.com/nodixblockchain wrote

There is an existential question i'm always wondering along those lines.

Like let say a module implement a factory, who will create a complex object and sub objects based on a declaration.

If the UI definition passed to the factory is statically defined in the client program, does that make the program statically or dynamically typed ?

Maybe you can present an code example for better clarification.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/keean/zenscript/issues/39#issuecomment-366509126, or mute the thread https://github.com/notifications/unsubscribe-auth/Ab61I_cYJRHTRnvU8eMLAqDkCKZCcHCHks5tWAh8gaJpZM4SA8te .

sighoya commented 6 years ago

@NodixBlockchain

You speak about dynamic loading, right? Afaict, this is done over a void pointer in c/c++ and void is used in c for runtime coercion/runtime polymorphism. I believe java.Object is defined on top of it.

But it has not always been the case for. Look at dlang, where you can read files and mixin the yielded string to code (Never tried it). This requires that the compile time functions aka templates are (semi) dynamically typed at compile time. Source 1 Source 2

NodixBlockchain commented 6 years ago

It's not necessarily with void pointers, even if MFC probably use them.

It can be like

// common.h

Class Interface
{
  pure virtual void print();
}

class dynenv
{
      Interface **MyInstances;

     dynenv(num)
     {
          MyInstances = malloc(num*sizeof(Interface *));
     }
}

static class myfactory
{
   dynenv *build(int *type, size_t num);
}

//module.cpp

#include "common.h"

Class class1 : Interface
{
    void print()
    {
         printf("class1");
    }
}

Class class2 : Interface
{
    void print()
    {
         printf("class2");
    }
}

   dynenv  *myfactory::build(int *type, size_t num)
   {
       dynenv *MyEnv = new dynenv(num);

       while(num--)
       {
           switch(types[num])
           {  
               case 0:MyEnv->MyInstances[num] = new class1(); break;
               case 1:MyEnv->MyInstances[num] = new class2(); break;
            }
       }
       return MyEnv;
    }

main.cpp

#include "common.h"

int types[]={0,0,1};
int main()
{
   dynenv *MyEnv;

   MyEnv = myfactory::build(types,3);

}

Then MyInstances[0] can be safely type casted to class1.

class1 *MyClass1 = (class1 *)MyEnv->MyInstances[0];

Because "types" is defined statically.

It can be made with templates in the factory too, to build different objects with more complex code, or be more complex than that.

In sort that you can have

class staticenv
{
    class1 *Instance1;
    class1 *Instance2;
    class2 *Instance3;
}

staticenv *env=new staticenv();

env->Instance1=(class1 *)MyEnv->MyInstances[0];
env->Instance2=(class1 *)MyEnv->MyInstances[1];
env->Instance3=(class2 *)MyEnv->MyInstances[2];

And then use env as if it was statically typed.

MFC is more complex because many things goes through the message proc and each instance has an ID mapped to a class and message handlers.

So there is more tricks than this, but the result is the same.

keean commented 6 years ago

@sighoya Personally I think reading files at compile time is a bad idea. Compiling should translate language A into language B without executing anything in language A.

By dynamic loading I mean loading code at runtime, like loading tool plugins for a painting program. We want to be able to distribute binary plugins, but the main program expects the plugin to confirm to a certain.type signature. We should be able to compile the main program and the plugin separately (on different computers at different times). We need to make sure both the main program and the plugin confirm to the same version of the interface. We can do this by cryptographically signing the plugin with a hash of the binary and a hash of the interface after type checking. The main program can then check the signature when loading.

shelby3 commented 6 years ago

@sighoya wrote:

Note: modularity and locality are two paradigms which are mostly contrary to each other.

The want to emphasize the burden of “fine-grained” (e.g. intertwined/tangled) in my prior post. Thus I’m pondering whether modules should be first-class. I’m studying Backpack. Also I had posited that perhaps the functionality gained by polymorphic variants can be modeled with typeclasses (aka adhoc polymorphism)? (see link in what I had written before as quoted below)

Polymorphmic variants (aka row polymorphism ← click for our past discussions

Adhoc polymorphism is not first-class. So not only do I think we need to minimize the interaction of concepts (especially first-class concepts), I think we shouldn't unnecessarily have more ways to do the same thing and exceed the Principle of Least Power.

@keean wrote:

Personally I think reading files at compile time is a bad idea.

Huh??? Do you mean compiling while the program is running and plugging in the code is bad?

By dynamic loading I mean loading code at runtime, like loading tool plugins for a painting program. We want to be able to distribute binary plugins, but the main program expects the plugin to confirm to a certain.type signature.

Agreed about typechecking. But why is it necessarily bad to compile JIT?

keean commented 6 years ago

@sighoya

Huh??? Do you mean compiling while the program is running and plugging in the code is bad?

Yes, I don't want to ship the compiler, type-checker and linker with production code. It takes a lot of memory, or means you have to install some environment onto the machine before you can use it (like the JVM).

I want a language that produces small binaries. If I write a program like this:

let a = readInt
let b = readInt
print string(a + b)

I would expect the machine code output to look something like this:

call readInt
move r0, r1
call readInt
add r0, r1, r2
push r2
call string
push r0
call print

plus the definitions of the functions actually used from the standard library.

Its also about predictability, not only is the JIT a big chunk of extra stuff to distribute, its also another potential source of errors.

I think it is the CPUs job to do runtime optimisation by speculative execution, branch prediction, instruction re-ordering etc.

sighoya commented 6 years ago

@shelby3 wrote:

Also I had posited that perhaps the functionality gained by polymorphic variants can be modeled with typeclasses (aka adhoc polymorphism)?

typeclasses provide adhoc polymorphism as method overloading do. The difference is that type classes give the overloading of some function a name/sense which is import for those functional guys. Further, typeclasses provide grouping, i.e. types which implements a typeclass belong to some group and each type in this group can safely passed to the same bundle of methods specified in this typeclass.

@shelby3 wrote:

Polymorphmic variants (aka row polymorphism ← click for our past discussions

Polymorphic variants subdivide into open record types(row polymorphism for records) and open sum types (row polymorphism for sum types aka data in Haskell)

Adhoc polymorphism is not first-class.

Why? Method overloading can be seen as row polymorphism on the parameter-return tuple (last elem of this tuple is the return type) where each overloading represents a subtype which could be also generic, too.

Further, you can treat typeclasses as first class in your language (abstracting over them and so on)

@shelby3 wrote:

So not only do I think we need to minimize the interaction of concepts (especially first-class concepts), I think we shouldn't unnecessarily have more ways to do the same thing and exceed the Principle of Least Power.

What are concepts against to typeclasses?

Further, I thought @keean and you don't want typeclasses in favor of modules?

keean commented 6 years ago

@sighoya I like type-classes because they allow you to define an abstract algebra. I don't like them because they are anti-modular. Perhaps modules + implicits is the best way to get type-classes that are also modular?

shelby3 commented 6 years ago

@keean wrote:

I think it is the CPUs job to do runtime optimisation by speculative execution, branch prediction, instruction re-ordering etc.

You recently wrote in these issue threads that maybe CPU will have to abandon speculative execution because of the recent SPECTRE vulnerabilities, which seem inherently impossible to guard against if the CPU is highly non-deterministic. Although I remarked that neither of us are experts in the specialized field (although I’m sure you know a significant amount about it given you’re an electrical engineer by formal training).

Huh??? Do you mean compiling while the program is running and plugging in the code is bad?

Yes, I don't want to ship the compiler, type-checker and linker with production code. It takes a lot of memory, or means you have to install some environment onto the machine before you can use it (like the JVM).

I disagree. Security requires compilation be done on the client. Even Ken Thompson’s Trusting Trust paper applies, but you and I had argued this point about capability security before where you argued there’s no security with capabilities, but I argued for application level sandboxes. Perhaps you argued (or would argue) for hardware sandboxes (virtual machines) but that it too coarse grained.

Embedded is a special case, but even this will require sandboxes for hot updated over the wire with IoT.

Sorry I think your view is that of a dinosaur. Afaics, that ship sailed already. There’s no going back.

@sighoya wrote:

typeclasses provide adhoc polymorphism as method overloading do. The difference is that type classes give the overloading of some function a name/sense which is import for those functional guys. Further, typeclasses provide grouping, i.e. types which implements a typeclass belong to some group and each type in this group can safely passed to the same bundle of methods specified in this typeclass.

Typeclasses also enable late (i.e. modular) binding of interfaces to types. Afaics, they’re also a partial solution to Wadler’s Expression Problem, with unions and immutability completing the solution. Immutability is required because unions introduce subtyping/subsumption into the typing system, but the new algebraic subtyping makes this sound and decidable by moving all the unions (all possible types) into the ground types instead of bounding all types by the top and bottom types.

As @keean points out, they also provide an abstract algebra because every type can only be implemented one way for each typeclass (although checking the implementations against axioms seems implausible, although Scalaz 8 claims to aim for this).

Polymorphmic variants (aka row polymorphism ← click for our past discussions

Polymorphic variants subdivide into open record types(row polymorphism for records) and open sum types (row polymorphism for sum types aka data in Haskell)

@keean had proposed to unify records, sums, and product types. I'm think row polymorphism just means conceptually a structure that implements a signature which doesn’t have all the members of that structure. If sums and records were unified, then ML functors could create “row polymorphism”. Am I missing something?

Adhoc polymorphism is not first-class.

Why?

Well we might argue that existential types (aka Rust’s trait objects) with a typeclass bound are first-class because they can be passed around as values, but afaik as @keean and @skaller have explained in the past, typeclass bounds aren’t types. They’re interfaces. They are orthogonal to the typing system. The unification of the types of the program can afaik be done orthogonal to the ~~unification of~~[instance selection for] the typeclass bounds of the program.

First-class features can wreck havoc on type inference, soundness, and decidability of the type system. Typing systems become much more complex/problematic moving away from the origin on the Lambda Cube.

@keean you’re encouraged to correct me if this is an incorrect response.

Further, you can treat typeclasses as first class in your language (abstracting over them and so on)

No afaics, there’s no way to make a type which is a typeclass. The typeclass bounds always go in the requires (or in Haskell before the =>) clause and never where the types appear.

Further, I thought @keean and you don't want typeclasses in favor of modules?

Afaik, we don’t have consensus on that yet. We’re analyzing tradeoffs and requirements.

@keean wrote:

@sighoya I like type-classes because they allow you to define an abstract algebra. I don't like them because they are anti-modular. Perhaps modules + implicits is the best way to get type-classes that are also modular?

Can we enumerate the ways they aren’t modular so we have coherent summary here?

Afaik, the issues are:

Open type families ( ← blog post written by current maintainer of Backpack) are not modular. Refer back to discussion linked from the OP for where I cited the LtU discussion between @rossberg (co-creator of 1ML and MixML) and @keean on that topic. Perhaps we use abstract types instead (which are inherently open and modular) but does that mean ML functors and emulating typeclasses like in Modular Type Classes?
Overlapping instances with the decentralized (open) nature of modularity. However, I have proposed that any instance declared in the module where the typeclass (that it implements) was declared, or secondly any instance declared in the module where the datatype (it is implementing), orphan all other instances. Those instances are always exported so the compiler must eagerly check they don’t overlap. If we also allow instances defined in their own module, but only if not overlapping (and/or have a means to choose which of the overlapping instances is used globally such naming it explicitly in some metadata file), then we have to transitively name them or export all instances and eagerly detect overlap. Otherwise name the instance you want to use and inject it by name, but this breaks recursive inferencing and the abstract algebra. Afaics, this seems to mostly solve this issue.

keean commented 6 years ago

@shelby3

You recently wrote in these issue threads that maybe CPU will have to abandon speculative execution because of the recent SPECTRE vulnerabilities, which seem inherently impossible to guard against if the CPU is highly non-deterministic.

These kind of optimisations have to be done in the CPU, I think by redesigning the cache to separate data by security ring and process ID this can be fixed.

I disagree. Security requires compilation be done on the client. Even Ken Thompson’s Trusting Trust paper applies

Of course it doesnt. Java shows how this is a broken model.

Security needs to be a runtime capability based process enforced by hardware. The hardware is much more sophisticated these days, for example lightweight containers like docker.

Sorry I think your view is that of a dinosaur. Afaics, that ship sailed already. There’s no going back.

This is nonsensical, how has the ship sailed? What language is compiled on the client? There are plenty of interpreted languages like python and JavaScript, but unless you build Linux distributions from source, you are always dealing with pre-compiled binaries.

Can we enumerate the ways they aren’t modular so we have coherent summary here?

The main way is that they are global, which is true for an abstract algebra, because we want operators to mean the same thing wherever they are called from.

NodixBlockchain commented 6 years ago

For me the language who got it totally right with modules is emerald. Because it makes separation of concern between reference to instances with an abstract type, type definition, and interface implementation ( modules).

I find this approach very powerful to deal with dynamic plugin and distributed system.

As It's often convenient for some part of the code to hold instance for type they know nothing about, but still being able to use interface that take this object as parameters via abstract type.

Or some part can just be instanciating new objects without implementing any interface for it.

Most languages conflate all those aspects, and its not always very practical, especially for highly dynamic or distributed system.

It's possible to have fine grained modules, but it needs inlining feature. With inlined modules implementation, its possible to generate very efficient assembly code for value operations.

But need the source of the implementation to compile a program using it.

shelby3 commented 6 years ago

Can we enumerate the ways they aren’t modular so we have coherent summary here?

The main way is that they are global, which is true for an abstract algebra, because we want operators to mean the same thing wherever they are called from.

That doesn’t speak to which specific issues prevent them from being modularized. I was enumerating the specific issues.

I don’t understand what of value can be gained from your comment? What practical aspects am I supposed to infer from what you wrote?

You recently wrote in these issue threads that maybe CPU will have to abandon speculative execution because of the recent SPECTRE vulnerabilities, which seem inherently impossible to guard against if the CPU is highly non-deterministic.

These kind of optimisations have to be done in the CPU, I think by redesigning the cache to separate data by security ring and process ID this can be fixed.

I repeat that neither of us are sufficiently expert to know. But I presume timing attacks can’t be entirely isolated without preventing the sharing of cache between processes, because side-effects can be transmitted in very subtle ways. I read/skimmed the white papers cited by the SPECTRE paper. You could refer to for example all the trouble Daniel Berstein underwent just to remove timing attacks from one highly structured cryptographic algorithm curve25519. Now imagine unbounded diversity of algorithms. I think you were more likely correct when you formerly surmised that they’ll have to rip out all of the non-deterministic speculative optimizations in CPUs. Or that processes will need to run completely isolated on their own core+cache and communicate between very high latency message channels. Yet there are not enough cores to run all the processes. And dedicating an entire core to a process is also a performance hit. So maybe just ripping out the speculative execution will scale better.

Even Ken Thompson’s Trusting Trust paper applies

but unless you build Linux distributions from source, you are always dealing with pre-compiled binaries.

Did you even bother to understand what the Trusting Trust paper is about. You must rebuild from source else you can’t trust it. And you can’t even trust the compiler unless you build it from source. Distributing binaries is very insecure. We can trust our own trust when we recompile everything ourselves. Binaries don’t insure that the program was compiled faithfully.

I disagree. Security requires compilation be done on the client.

Of course it doesnt. Java shows how this is a broken model.

We already had this argument last year. I cited Joe-E which was designed for maximum security, which Java was not.

There’s no such thing as 100% security. That is the first principle any security expert will tell you.

A sandbox is the first-line of defense and it’s better than Trusting Trust and no first-line of defense.

And formal methods and correctness checking such as in Rust, may help improve the quality of sandboxes.

Security needs to be a runtime capability based process enforced by hardware. The hardware is much more sophisticated these days, for example lightweight containers like docker.

SPECTRE proved that such security is not 100%, even without factoring in human error and phishing.

Also as I wrote in my prior post, it’s too coarse-grained. Process-based isolation can’t do capability-based security.

Sorry I think your view is that of a dinosaur. Afaics, that ship sailed already. There’s no going back.

This is nonsensical, how has the ship sailed?

It sailed because of JavaScript and soon WASM. And because nobody wants to mess around with downloading and installing untrusted binaries, because it invokes too many opportunities to inject viruses and bad code. Managed sandboxes are not going away and will proliferate swallowing all forms of code distribution eventually. WASM is also all about exposing low-level while maintaining a security model. Running unmanaged and shipping different binaries for each hardware has sailed is never coming back. You’re simply incorrect. Sorry.

Whether sandboxes can unify around a higher-level language with even more security guarantees than WASM is an open question. Maybe the greatest common denominator of PL that are needed is at best something like WASM. I’m hoping a bit more high-level would be plausible. I think experimention on WASM will help the world figure that out.

What language is compiled on the client?

The entire point is that users want sandboxes with managed systems. Thus whether we compile on the client or not is irrelevant. The point is there’s nothing bad about compiling on the client. You were arguing about overhead of shipping compilers and managed systems, but that ship has sailed.

^{_{^{_{META: There you go again with personal insults. “Nonsensical” is a factual statement or more inciting hatred while being incorrect? You could leave off the word “nonsensical” and still make the same point. At least I wrote “sorry” meaning I wasn’t trying to unnecessarily slight you. I do think your view is outdated.}}}}

NodixBlockchain commented 6 years ago

For apps where performance and ressources matter, they still ship binairies, like video games, or multi media, even browser and bitcoin It's not going to run on managed code soon. Most of those are still made in c++.

Even having a VM or sandbox doesnt change much with security issue, it just move it into the VM. For the cache and time issue, i'm not sure sandbox are going to solve it.

It can be more convenient for app developpers, because responsability is moved a maximum out of there code, but for end user security, it doesnt change much, except apps can run on a trusted well tested framework, but even on android it's not that hard to root the device and there are many exploit to run out of the sandboxes, same for web browsers.

And when there is a bug in the VM or sandbox, it mean all devices using it are potentially vulnerable , and there is not much to do from app perspective if some bugs can be exploited in the VM or sandbox that can be triggered by their code.

Binaries can be signed by a trusted source, and even compiling the sources doesnt garantee there is no bug or problem in the source code.

All attempts at doing optimisation and management of performance / execution ordering etc from compilers instead of cpu have been faillures.

Optimisation at cpu level at runtime works much better than trying to pre optimize this from compilers.

And removing those would mean massive power down of processing power. And it cant be replaced by pre computed compiler side optimisations.

For security i would believe it would be great to have objects scale memory management on cpu, or that cpu can understand more complex design pattern in the MMU, but they still operate mostly at process level.

From there all a sandbox or managed code can do is provide Well tested primitive and VM to app developpers, but even those are not exempt from security problems.

Didnt read much in depth about spectre etc but i don't see how sandbox or managed code can solve this.

From what i read even managed languages are vulnerable to it. There are even apps on the web to do test on the browser, so i guess sandboxing doesnt really solve this.

shelby3 commented 6 years ago

_{I’m going to break your preconceptions and cherished falsehoods.}

For apps where performance and ressources matter, they still ship binairies, like video games, or multi media, even browser and bitcoin It's not going to run on managed code soon. Most of those are still made in c++.

Not all games need performance. My gf plays crosswords and candy crush on her mobile phone which aren’t performance critical. Twitter is a game. Steemit.com is a game. Etc..

Even for games that need for example 3D performance, they can increasingly offload performance needs on libraries such as OpenGL or Unity3D. Many real-time C++ games have been compiled to ASM.js (and soon WASM) to run in the browser. IOW, adding performance to games is more about parallelism than low-level semantics.

And the deployment to managed systems will inexorably increase because as the 150+ IQ literary genius Eric S. Raymond pointed out in his recent blog series analyzing Go vs. Rust, that even a GC language such as Go is now swallowing a significant chunk of the systems and server programming needs. Even Go doesn’t allow you to fiddle with the M:N green threads (goroutines) scheduler nor the GC, because its automation is sufficient for most/many use cases.

That’s why I been telling you for the past year or more that your expertise in “close the metal” knowledge was going to become less applicable to the mainstream as time goes on. Transition up the semantics layers or be a dinosaur.

Even having a VM or sandbox doesnt change much with security issue, it just move it into the VM.

This entirely misses the point I made in my prior post, which is the managed code has very high economies-of-scale and can be the best first-line defense possible. What else would you put as a first-line defense that can also do capability-grained security? Geez, you didn’t even comprehend what I wrote.

but even on android it's not that hard to root the device and there are many exploit to run out of the sandboxes, same for web browsers.

If it was so damn easy to break the security of web browsers, then why aren’t my Bitcoins stolen already? I keep my wallet on my computer. And I load many, many websites and PDF files. But I am not easily fooled by phishing.

Android’s security model is designed to protect against unintentional errors in app programming, not intentional malware. It was not designed as a hardened system against an app that desires to do evil. The Playstore exists to provide trust reputations on vendors. So together with that trust system, we want the security model to protect against unintentional mistakes of the developers.

Whereas, the browser loads websites that have no reputation or trust model, so the security model must be hardened.

Binaries can be signed by a trusted source, and even compiling the sources doesnt garantee there is no bug or problem in the source code.

Signing doesn’t guarantee that the developers didn’t accidentally inject a bug in a new release that creates a security issue. So the Android security model provides some protection against such security breaches. JIT V8 compiling on the client such as for JavaScript coupled with a hardened security model, is applicable where reputation and trust can’t exist because we visit new websites instantaneously.

All attempts at doing optimisation and management of performance / execution ordering etc from compilers instead of cpu have been faillures.

Incorrect. Java Hotspot improves performance the longer the application runs. In fact, it can use higher-level knowledge that the CPU is oblivious to.

Didnt read much in depth about spectre etc but i don't see how sandbox or managed code can solve this.

They can’t. That is why the only solution will likely be process level isolation or abandoning speculative execution, both of which afaics will be a reduction in performance and battery life.

But that’s irrelevant to the point that process level security is not capability-level security. And that process level security should not be the only first-line defense.

keean commented 6 years ago

@shelby3 Please note WASM is a move back to pre-compiled code from JavaScript.

You are right that sand-boxes on the client are important (see my comments about docker), but I think you are wrong about people wanting to compile source code in the client.

shelby3 commented 6 years ago

@keean, afaics the point of this discussion was whether users have any reason to object to automatic compilation of code on their client. Clearly the ubiquity of JavaScript has demonstrated they don’t. Afaik users hate dealing with binaries. I hate dealing with binaries on my Linux system, because of dependencies (especially on Linux that need to be compiled) for example versioning crap meant I was never able to figure how to get GCC to configured properly (Clang is working). I will probably have to build a new Ubuntu system probably to get it to work. So the point is that binaries also suck for users. Automated compilation with dependencies that always work, is what users love.

The issue for users is convenience (and possibly compile times). One of the members of the Go team was talking about how the compiler is so fast, that it blurs the distinction between interpreted and compiled. I’ve also read that the OCaml compiler is very fast. There’s also a discussion from the creator of D about the comparison of how much slower and rigor mortis (especially the build process) C++ is. At 21:00 minutes they summarize Rust, Go, C++, and D.

Now I realize that some developers will precompile JavaScript to WASM in order to make load times faster, but this is probably justified by I presume JavaScript offering no additional security that WASM doesn’t provide. And this won’t preclude compiling some JavaScript to WASM client-side.

I think WASM will facilitate compiling various programming languages client-side, because there won’t be any need to configure the compiler differently for different computers.

Flawless portability is huge factor in automation and user convenience. WASM is more important than people realize. But so far WASM doesn’t support GC nor goroutines natively.

(note I am very sleepy and can not give this the best reply due to being too groggy, may edit after I sleep)

keean commented 6 years ago

I don't think you would compile JS to WASM, as WASM has unmanaged memory. One of the big things to come out of Mozilla's Rust Browser project was that having two separate garbage collectors was inefficient. It turns out to be a lot better to pass all allocations through to the same GC. It may happen if WASM gets managed memory, allocated by the main JS GC. Otherwise it will be more better for performance to just use normal JS, minimised and gziped for fast download.