Closed eiriktsarpalis closed 9 years ago
To give a more concrete example, the proposed change in the programming model would essentially contain two types, Local<'T>
and Cloud<'T>
, which would both be subtypes of the abstract type Workflow<'T>
(names are indicative and subject to revision, of course). Each of the two subtypes would come with distinct expression builders:
let readFile (reader : Stream -> 'T) (f : CloudFile) : Local<'T> = local {
use! stream = CloudFile.ReadAsStream f
return reader stream
}
and
let readParallel (reader : Stream -> 'T) (files : CloudFile []) : Cloud<'T []> = cloud {
return! files |> Array.map (readFile reader) |> Cloud.Parallel
}
The suggested bind operation in local workflows will have type signature Local<'T> -> ('T -> Local<'S>) -> Local<'S>
, whereas in cloud workflows it will carry the signature Workflow<'T> -> ('T -> #Workflow<'S>) -> Cloud<'S>
. In other words this would mean that local workflows freely compose with other local workflows, whereas they cannot bind to cloud workflows. On the other hand, cloud workflows can freely bind to both local and cloud workflows.
The Cloud.Parallel
keyword will have type signature seq<#Workflow<'T>> -> Cloud<'T []>
; since the primitive introduces distribution by definition, it will force 'elevation' of any workflow into a cloud workflow. At the same time, any use of Cloud.Parallel
inside a local
workflow will introduce a type error. For threadpool-based parallelism, a Local.Parallel
primitive will be introduced, of type signature seq<Local<'T>> -> Local<'T []>
. This will offer static assurance that the this local parallelism combinator cannot compose with child computations that introduce parallelism.
In other words, the proposed amendment to the programming model aspires to constrain distribution effects at compile time.
Given the fact that Cloud<_> and Local<_> are defined as
type Local<'T> = Workflow<Local, 'T>
type Cloud<'T> = Workflow<Distributed, 'T>
I was thinking that CloudBuilder.Bind could be something like
member __.Bind<'Ctx, 'T, 'S> (f : Workflow<'Ctx, 'T>, g : 'T -> Workflow<Distributed, 'S>) : Workflow<Distributed, 'S>
Well, that's essentially identical to the current signature.
I'm referring to this (f : #Workflow<'T>) https://github.com/mbraceproject/MBrace.Core/blob/local-workflows/src/MBrace.Core/Continuation/CloudBuilder.fs#L240 I think it is better for all types of bind to be of the same form Workflow<, >
I have a few updates on the proposed changes. I have pushed a new branch that proposes an alternative representation for local workflows.
The initial branch uses a phantom type approach
[<Abstract>] type Workflow<'T>
type Workflow<'Ctx, 'T when 'Ctx :> SchedulingContext> = inherit Workflow<'T>
type Local<'T> = Workflow<Local, 'T>
type Cloud<'T> = Workflow<Cloud, 'T>
This approach comes with the drawback that phantom types may be confusing to novice users. Type errors when doing incompatible compositions may also be difficult to read. What's more, the type alias declarations are only visible from F# code; a C# library would have to define separate types of wrappers which would possibly hurt interoperability.
The new branch uses the following, more straightforward approach:
[<Abstract>] type Workflow<'T>
type Local<'T> = inherit Workflow<'T>
type Cloud<'T> = inherit Workflow<'T>
This has better readability, but restricts scheduling context definition to the core library. It also does not allow context polymorphism. For instance, the combinator
Cloud.Catch : Workflow<'Ctx, 'T> -> Workflow<'Ctx, Choice<'T, exn>>
has to be rewritten to two concrete implementations:
Local.Catch : Local<'T> -> Local<Choice<'T,exn>>
Cloud.Catch : Cloud<'T> -> Cloud<Choice<'T,exn>>
It is not clear to me which of the two approaches is simplest.
I must admit, I had until now no clue what the term "phantom types" even referred to. Now I think I see what you're saying - basically, in the first implementation both Cloud and Local are both really the same type, but at compile time you have the notion of two distinct types. In this way you can write operations that can apply across both simply by targeting the "real" type, or just one of them by using one of the two phantom types. OK. So this probably has lots of benefits to the implementation I would imagine in terms of code reuse and simplification? So from an F# perspective it's only positives I guess? Could you give examples of "bad" error messages caused from this?
Alternative is to create two concrete types with an inheritance chain - with means from a runtime perspective we get two types but there's a duplication of code etc..
What I like is that both approaches tend to suggest that the beginner can simply use Cloud everywhere and not worry about locality, and as their understanding improves they can adopt Local for efficiency etc.. The C# point is also valid and is important. What's going to be the cost in terms of the wrapper API if you're using phantom types? Is it going to be massively more costly to implement the wrapper then this is something you should think very carefully about. If it's only a little bit more costly to do, I would tend to go down the phantom types approach.
Well, it does result in some code duplication but this really only affects the internals of the core library.
In terms of error messages, given the following piece of code:
local {
let! x = Cloud.Parallel [ cloud { return 42 }]
return x
}
The former approach would give:
Error 1 Type mismatch. Expecting a
Local<'a>
but given a
Cloud<int []>
The type 'Context.Local' does not match the type 'Context.Cloud'
whereas the latter woud give:
Error 1 This expression was expected to have type
Local<'a>
but here has type
Cloud<int []>
The first case is not terrible, but the last line might be a bit confusing.
As regards the C# api, I don't think it's so much an issue of performance as one of composability. It should be possible to compose workflows defined in C# and F# libraries interchangeably, without having to wrap or unwrap types.
OK. The first error isn't actually that bad really, but yes, the second is more what I would be used to. Regarding the C# API, I was more meaning the issue of effort to write wrappers etc. rather than performance.
Yes, it's a nice goal to compose workflows in either language without wrapping but this might also limit your options in terms of what types you expose in F# because you're thinking about how they look to C# e.g. option types, discriminated unions etc.. Wrapping might be the only course to achieve an idiomatic C# library. I believe that Akka .NET have done the same thing albeit the other way around. Orleans on the other hand....
I've not had a look at the C# API yet though so I might be spectacularly off course here :-)
Yes, I think you're right. It still seems to me that phantom types would make a cool F# snippet, but feel a bit strange when used as the centerpiece of a large framework.
If the shoe fits... :-) Until we're at 1.0 there's nothing wrong with putting stuff in and removing it IMHO. Particularly if this is going to speed up development time - as we approach a stage when you feel that the core feature set is ready, that's when you might want to review this stuff.
Again, IMHO.
@eiriktsarpalis - Can Local
Yes. Only Cloud.Parallel/Choice/StartAsCloudTask require explicit use of cloud {}
So what happens if Local<T>
and Cloud<T>
are completely separate types and Workflow<T>
is completely hidden? It sounds like there is very little code that operates over Workflow<T>
, i.e. generic w.r.t. annoation?
Most combinators that return Cloud<> take Workflow<> as input. For instance
Cloud.Parallel : seq<#Workflow<'T>> -> Cloud<'T>
and
cloud.Bind : Workflow<'T> -> ('T -> Cloud<'S>) -> Cloud<'S>
Having a base type essentially simplifies APIs that take any workflow as input.
OK, I see. Tricky choice!
Plus, it would be impossible to have this otherwise reasonable expression
Cloud.Parallel [ cloud { return 1 } ; local { return 2 } ]
@eiriktsarpalis Hmmm. Does that mean "run these two workflows in parallel across the cluster - first one is a cloud workflow, ok. second one is a local workflow, promote it to a cloud workflow"?
@isaacabraham Not exactly. It means that both types can be run remotely, however the first one may encapsulate further distribution, while the second comes with a static guarantee that it will be constrained to a single machine if run. For a more illustrative example:
Cloud.Parallel [ Cloud.Parallel ... ; local { return! CloudRef.New 42 }]
The first argument and left argument are of type cloud since their execution might span multiple machines, while the right argument will not. Attempting to invert things, for instance
Local.Parallel [ Cloud.Parallel ... ]
will result in a type error. Having type local is a guarantee that no distribution will occur as far as MBrace is concerned inside that particular workflow. However, having type cloud does not necessarily imply distribution, for instance cloud { return 1 }
.
This addition is merely a type-level safety blanket. It does not affect the underlying implementation or execution model. It has actually helped uncover a few bugs, in some cases where distribution had been accidentally introduced in store APIs. Overall it has increased my confidence in working with the model, my only hope is that it has not been rendered incomprehensible to outsiders in the process.
OK, yes, got it. It's not incomprehensible IMHO, but it needs to be introduced carefully within tutorials etc. so that users understand how to reason about local { } versus cloud { } - it's definitely not a beginner "feature". The beauty is that because everything works under the Cloud expr., there's no need to introduce it immediately IMHO.
Based on my experience, is very helpful to have the static guarantee but also the visual cues (cloud {} vs local {}) which is necessary to understand the dynamics of your code.
Exactly! The novice need not even notice that there is a local expression builder. It could be introduced further on, in a 'constraining distribution' chapter.
@eiriktsarpalis the only thing I would suggest you think about is namespacing of this expr. - if cloud { } is at the top level of MBrace etc., I would suggest putting local { } in a sub-namespace. The novice should (ideally) be able hit "." in intellisense and just see the constructs that would be useful to him for 80% of his or her needs etc.
Great! Let's see if we can settle the phantom type dispute then and merge the 'correct' branch to master. @dsyme ?
The final position seems reasonable. This is the sort of addition which you have to be prepared to unroll if usability evidence indicates that's necessary, or if it collides with some other concern, but what you've got is worth running with for now.
Some feedback on this design...
I was teaching fresh users for the first time today with this enabled (experienced F# programmers).
Cloud.Parallel
simply by showing their type, but now I had to explain what Workflow
was too. That definitely made things more complex to teachAsync<T>
for everything. I explained that it's because Async<T>
values can't be serialized or would have different semantics w.r.t. shared memory. I don't know if this is the best explanation or not, but it is an interesting question. I think it becomes particularly interesting now we have three levels of workflows (Async --> Local --> Cloud) - people will definitely ask "why so many" and I think it may impact adoption, especially in independent learning scenarios (in tutorials you can get people over this hump).I still think the additional safety is worth it, but do wonder why "Local" and "Async" aren't the same thing.
Good question. Let me first clarify that Async values are perfectly serializable and can be composed/executed in MBrace workflows with the expected shared memory semantics.
In fact, Async used to play the role of Local until very recently. However, Async lacks the dependency resolution mechanism used by MBrace workflows. For example:
type CloudFile = { Path : string }
member file.ReadAllLines() = local {
// resolve file store implementation particular to current execution context;
// in azure it would be a blob store client bound to the user's connection string
let! fileStore = Workflow.GetResource<ICloudFileStore> ()
return fileStore.ReadAllLines(file.Path)
}
Dependencies are threaded through the workflow in a reader monad fashion. The CloudFile instance contains only its path and connects to it using dependencies from the execution context. It would be impossible to do the same with Async without somehow encapsulating store logic.
With the update to the programming model, there are many things that need to brushed up. For example, the Cloud.OfAsync : Async<'T> -> Local<'T>
primitive has a type signature that seems to contradict its name. We have kept it like this out of backwards compatibility for now, but we would love to have more feedback on how to improve things.
I think that maybe the phantom type design is more clear for experienced and "maybe" to newbies as well. One possible simplification is to rename Workflow to Cloud<'Ctx, 'T> Some examples:
Cloud.OfAsync : Async<'T> -> Cloud<Local, 'T>
Cloud.Parallel : seq<Cloud<'Ctx, 'T>> -> Cloud<Distrib, 'T []>
CloudBuilder.Bind : Cloud<'Ctx, 'T> -> ('T -> Cloud<Distrib, 'R>) -> Cloud<Distrib, 'R>
LocalBuilder.Bind : Cloud<Local, T> -> ('T -> Cloud<Local, 'R>) -> Cloud<Local, 'R>
But you would still use the Local<'T>
alias for Cloud<Local, 'T>
right? If yes, then confusion persists. If not, you confuse even more by forcing people to explicitly work with phantom types.
My idea is to remove the type aliases, I think that with the phantom types exposed is more clear (to me at least).
One concept is dropped (Workflow) at the expense of the phantom type parameter
Cloud.Parallel : seq<Cloud<'Ctx, 'T>> -> Cloud< 'T []>
We would also need to have a Cloud<'T>
base type (to replace Workflow<'T>). Thus:
Cloud.Parallel : seq<#Cloud<'T>> -> Cloud<Distrib, 'T>
Not sure which is less confusing TBH.
The confusion arises because we hide the context. Maybe we can keep the Cloud<'T> as a type alias of
Cloud<Distrib, 'T>
and delete the Local<'T> completely.
For the scenario of non-uniform Cloud.Parallel
Cloud.Parallel [ cloud { return 1 } ; local { return 2 } ]
we can provide two cast operators
Cloud.ToLocal : Cloud<Distrib, 'T> -> Cloud<Local, T>
Cloud.AsDistrib : Cloud<Local, 'T> -> Cloud<Distrib, T>
and we can keep one uniform Cloud.Parallel
Cloud.Parallel [ cloud { return 1 } ; Cloud.AsDistrib <| local { return 2 } ]
Cloud.Parallel : seq<Cloud<'Ctx, 'T>> -> Cloud<Distrib, 'T []>
Another possible idea is to have something like
type Context
type Distrib = inherit Context
type Local = inherit Context
type Cloud<'Ctx, 'T when 'Ctx :> Context>
Cloud.Parallel<'Ctx when 'Ctx :> Context> : seq<Cloud<'Ctx, 'T>> -> Cloud<Distrib, 'T []>
let first : Cloud<Context, int> = cloud { return 1 }
let second : Cloud<Context, int> = local { return 2 }
Cloud.Parallel [ first; second ]
FWIW I'm not at all sure the current situation is a critical problem as such - people did notice "Workflow" and "Local", and asked about "Async", but equally they understood my explanation and saw that the distinctions brought value.
The real test will come when
In both these situations you will notice that the complexity of what can be communicated is lower than you expect, and that simplicity is a total virtue. To be successful, MBrace must be virally communicable as a simple-but-powerful big data technology - it's not just whether we can teach it, but whether the people we teach can teach it.
Valid point. The teaching aspect is the critical one.
What if I made Cloud<> the base type and had Local<> be its subtype? We could probably get rid of Workflow<> without changing much else.
On Friday, March 13, 2015, Nick Palladinos notifications@github.com wrote:
Valid point. The teaching aspect is the critical one.
— Reply to this email directly or view it on GitHub https://github.com/mbraceproject/MBrace.Core/issues/9#issuecomment-78690275 .
Sent from Gmail Mobile
Simplification is always a good thing, one less thing to worry about.
But then a Local could be used where a Cloud is expected?
Yes, that is in fact expected behaviour, even now. It is the opposite situation that we are looking to avoid.
Actually, my original planning was to do it like this, but then I went for the phantom type approach out of convenience. I complete forgot about this afterwards. The more I think about this, the more it makes sense to do it like this.
Yes, sorry, brain fail :)
Ok, this refactoring worked out smoothly: https://github.com/eiriktsarpalis/MBrace.Core/commit/ddb77c29c65c055408b22291f560f269c42cb901
If everybody agrees, I'll proceed with a PR.
As an added bonus, this is now possible:
Cloud.Parallel [ cloud { return 1 } ; local { return 2 } :> _ ]
Yep, I think it works elegantly!
Feedback time:
First the very good :) I just took a non-PL F# user through MBrace - we concluded the session by running the scale-out comparison of feature extraction sets for 700,000 images over 200 Azure machines (currently running). It was really fun to use the system in anger over data. We'll write up more details in due course.
However, I do have to mention that both Local<_>
and the #
in #Cloud<_>
in signatures are definitely noticed they add non-trivial learning burden for users. It is really quite hard for people to grok why subtyping suddenly makes an appearance, especially since they are in the process of learning an already abstract notion like Cloud
.
When someone knowledgeable is on hand to get them over this hurdle with confidence, it is ok. But if they are on their own, I think most entry-level F# users are going to struggle with Cloud v. Local v. Async v. CloudFlow. It's just hard for people to learn these things and understand why so many new control constructs are needed.
So the feedback is similar to before: Local<_>
reduces bugs, I like it. I'm not saying we should remove it. But Local<_>
adds a non-trivial burden to the learning curve, even for F# programmers, and I'm concerned about that. People accept an explanation when it is given but if left to their own devices they might not like it. It's quite possible people will reject MBrace as "too complicated" when they see this alone.
One option may be to keep pushing Local<_>
and #Cloud<_>
to the fringe so you don't really notice it until you need it. The specific place where #Cloud<_>
gets noticed is in the signature to Cloud.Parallel
and Cloud.Choice
. To be honest we could just make this accept type Cloud<_>
and have a Local.Parallel
and Local.Choice
as well.
The specific place where Local<_>
gets noticed is in the return type to CloudCell.New<_>
. This is harder to hide.
Where is there any details on CloudFlow?
CloudFlow is the new name for CloudStream
Perhaps the root of problem lies in the naming of Local<_>
. Just by name, it would be impossible to guess that this could be a subtype of Cloud<_>
. What if we called it LocalCloud<_>
or Cloudlet<_>
or Vapor<_>
? We would probably have to rename local { }
which I have grown incredibly fond of.
That's certainly one issue, though the root of the problem is simply the existence of more concepts for people to learn rather than less. 2 concepts are just harder to learn than 1 (while equally, 2 concepts may catch more bugs than 1 concept).
I assume a C# version of all this would likely just recycle Task<T> = Local<T> = Cloud<T>
and be called "task execution", no matter what the correctness problems with that. But people would probably flock to it - partly because it's C# and partly because it's perceived to be simpler. Then a few people would realize the problems with it and start to refine the types used, but most wouldn't....
Another POV is that the Local v. Cloud distinction may be of more use to us as implementors (and also to power uses) than it is to 90% of potential users. If that were deemed to be the case then perhaps it's something that could be present in debug mode (when building the library), and absent in release mode, though that's somewhat tricky to arrange.
Still another POV is that you could opt-in to a "stricter" API using "open MBrace.MoreExactTypes" or something, and without that you just get Cloud<T>
But I'm really not sure if we need a change here - I'm really just giving feedback from 1:1 learning sessions. We'll learn more as we teach it more. However I've definitely noticed that some F# frameworks suffer from "too much type-level complexity" and "too many things to learn", and I believes it can hinder their adoption.
For the names above - I think local is better than those - people learn "local" in juxtaposition to "cloud" so they get the difference.
Drawing on the discussion started in this issue, I would like to share a few thoughts on the programming model.
As you may know, cloud workflows are used in every aspect of the MBrace API, from parallel combinators to store operations. For instance, the
ICloudDisposable
interface has the following signature:An interesting question that arises here is, how one can know if a dispose implementation does not introduce distribution? While it makes sense that all primitive store operations should not introduce distribution, this cannot be guaranteed by their type signature. A workflow of type
Cloud<unit>
could either signify asynchronous store operation or it could contain a massively distributed computation. In other words, there is no way to statically detect if a workflow carries the distribution effect.Currently, this is somewhat mitigated using the following primitive:
This has the effect of evaluating the input workflow with thread pool parallelism semantics, thus giving a dynamic guarantee that the nested computation will never exit the current worker machine. It offers relative sanity, but is hard to reason about and does not work correctly in conjunction with forking operations, like
Cloud.StartChild
.My proposal to amending this issue is to introduce two brands of computation expressions for MBrace workflows, for local and distributed computations. A distributed workflow can compose local workflows, but not the other way around. Store operations will be local workflows and the parallelism primitives will necessary return distributed workflows. This would allow to statically reason about the distribution effect, while potentially complicating the programming model.
I have created an experimental branch that attempts to develop these ideas: Workflow definitions Builder declarations Store operations using local workflows Cloud.Parallel primitive
Thoughts?