Closed eiriktsarpalis closed 9 years ago
Perhaps an alternative is this naming:
type Cloud<'T>
type Cloud<'T, 'Where> : Cloud<T>
type Local // a tag type
So what we call Local<T>
today is always seen as Cloud<T, Local>
. Then the average user only ever sees
Cloud<T>
Cloud.Parallel: seq<Cloud<T>> -> Cloud<T[]>
And the power users sees:
cloudLocal { ... } (or local { ... } if you like)
Cloud<T,Local>
CloudLocal.Parallel: seq<Cloud<T,Local>> -> Cloud<T[],Local>
For the power user the machinery is at least reusable if they choose to define their own tags, with the added bonus that combinators like CloudLocal.Parallel also preserve localness.
The structure as it is right now works beautifully, maybe we need to rename Local<'T> to CloudLocal<'T> in order to give it some context. But local {} is too beautiful to change it.
@palladin @eiriktsarpalis: I personally really like the local { }
abstraction. Once I "got" it, it allowed me to more easily reason about my code.
However, I'd also strongly recommend acting on this feedback, even if it's negative or not necessarily in line with what was hoped for. If people are struggling now - and I assume from Don that most of the individuals he's coaching are MSR people - then the average developer will probably experience the same. Most of those individuals won't have an F# expert sitting by their side either. Some of those might well just give up if they get stuck on something like this.
So whilst I think I'm in agreement with you regarding the effectiveness of local { }
I'm also worried about users struggling to get up to speed with the different abstractions - something I've seen as well.
@isaacabraham I agree that local {} is not for the average user. I think that the problem is that Local<'T> appears in many entry level APIs. Maybe if we prefix the Local type with Cloud CloudLocal<'T>
we can make it more regular as a member of the family of Cloudxxx types and of course more digestible for the newcomer.
I'm not sure if renaming Local<'T>
would somehow ease understanding for novices. Combinators that explicitly require local as arguments would type error if supplied with cloud. So if the goal is to delay introduction of the concept, I find it unlikely that it will be achieved in this way.
Having played with local workflows quite I bit, I can say that the local/cloud duality is a central point of the programming model as it is. Perhaps it would make sense to promote this distinction from the very beginning in tutorials.
Here's an overview of the puppy image real-world scenario from yesterday. Basically the work came down to transporting 2GB (N * M * Size) of (string * int[])
data to the cloud, running (N_N_M) Set.ofArray/Set.intersection, operations, checking if the intersection size was greater than some threshold (indicating similar or duplicate images), and returning strings (indicating the duplicate image names). It turned out it could be composed to M independent jobs, each running at most 12 hours, so we used a 150 machine cluster to do it There were probably a whole lot of optimizations we could have done (bit sets etc.), but we didn't need to bother.
In this scenario the final solution just ended up using nothing but cloud { ... }
, CreateProcess
, AwaitResult()
and that's all. Any data transport to the cloud was implicit in the cloud { ... }
blocks.
We experimented with Cloud.Parallel but the overall size of the serialized job was too big, so we broke the work into the M independent CreateProcess calls (this is also why we were trying to parallelize calls to CreateProcess hence our bug report about that). For a while we thought we might have to store the data in the cloud so we started to do M jobs doing CloudCell.New, but then we realized that storage was temporary and could be fused out.
Most of our actual work was preparing/shaping/trialing small trial jobs (e.g. M=3, N=3) to estimate how much compute and process-upload time we were going to need in total.
This scenario seems very typical of the "medium-data-plus-big-compute-in-the-cloud" scenario that MBrace will absolutely excel at. For this scenario, the magic of MBrace is in REPL scripting and seamless-data-plus-code-transport-to-the-cloud. We were pleased with the ease and simplicity of that - Vagrant + MBrace + Brisk is an amazing, simple, exploratory, playful cloud scale-out programming environment.
I'll experiment with a PR for this idea: https://github.com/mbraceproject/MBrace.Core/issues/9#issuecomment-88632280, I'm fairly positive about this.
@dsyme Great, we would love to have his testimonial on the website once he's done.
@eiriktsarpalis @palladin
Reopening this old chestnut again...
Unfortunately I've had the feedback that "Learning local ... is really confusing" once again.
I'm still not sure the local { ... }
v. cloud { ... }
distinction is hitting the sweet spot for users (as opposed to combinatory-implementors). That is compared to alternatives like "everything is cloud { ... }
".
First, the word "local" is still confusing people - is it "local to the worker" or "local to the client" or "local to a machine" or ... People seem to be interpreting it as "local to the client" because of terminology like cluster.RunLocally
.
Second, I'm just not sure that cloud v. local seems is a distinction that's so important to the majority of users. People seem to be really, really confused by it and don't understand what it's giving them. The majority of uses of MBrace are where there is nothing but either cloud flows or "start lots of jobs and wait for them".
Do you think it might be possible to somehow enable this distinction optionally, for authors of libraries like MBrace.Azure and MBrace.Flow, by opening extra namespaces? If that were possible it might deal with the problem.
I think this problem essentially boils down to the naming of the Local<_>
type. The local
expression builder as well as the Local.*
methods can be easily hidden away in separate namespaces and never be noticed by novice users. The Local<_>
type however is pervasive and cannot be ignored. Perhaps a good rename of this type would solve this ambiguity. We go for the phantom type approach or just call it LocalCloud<_>
.
What if we remove the static typing of local {} and bring back dynamic typing-checking! Example
let mapCloud (f : 'T -> Cloud<'R>) (x : 'T) = cloud {
let! r = local (f x) // local : Cloud<'T> -> Cloud<'T>
}
mapCloud (fun x -> cloud {
let! y = Cloud.Parallel[] // exception boom runs in local context
return y
})
@palladin I'm strongly opposed to such an approach.
:)
I spiked two possible changes here: https://github.com/mbraceproject/MBrace.Core/compare/master...dsyme:fix-local
Local<T>
= Cloud<T>
in release mode, with mapLocal
--> mapCloud
Local
--> Cloud0
and local
--> cloud0
and use the terminology "single-machine cloud workflow"
Wasn't there a discussion about replacing local with async at some point as the semantics are somewhat similar?
An alternative is to go back to cloud { } but with some way of indicating that some cloud workflows only operate locally. I don't like this idea though.
I'm of the opinion that local { } has real value in being able to reason about your code and where it executes - which is one of the harder bits of Mbrace for beginners - and perhaps this problem can be solved by simply hiding local { } from higher namespaces and putting some xml comments for the beginner on local to say "just treat this as cloud if you're a beginner" :)
I like the terminology "single-machine cloud workflow"
The way I look at it is like this
CPU | Async I/O + cancellation + single thread | Async I/O + cancellation + multi-thread | Cloud I/O + cancellation + multi-thread + single machine | Cloud I/O + cancellation + multi-thread + multi machine | |
---|---|---|---|---|---|
Notes | Supported as work specs in MBrace APIs. No scheduling of nested cloud computations, safe to use shared memory and unserializable objects up to multi-threaded concurrency safety | Supported as work specs in MBrace APIs. Scheduling operations may serialize, dangerous to use shared mutable memory and unserializable objects unless the work is effectively cloud0 . |
|||
normal F# code | x | ||||
async { ... } |
x | x | x | ||
cloud0 { ... } |
x | x | x | x | |
cloud { ... } |
x | x | x | x | x |
The advantage of the "cloud0" name is that it implies "it's like cloud { ... }
, and it's for cloud programming, but more restrictive than cloud { ... }
".
[ As an aside, when looked at this way, you can also imagine there being an async0
that is single-thread (which can't start new child tasks in the thread pool, and only supports StartImmediate) . ]
[ As an aside I suppose the distinction between Async I/O and Cloud I/O isn't really very meaningful - indeed Async I/O for web requests is likely to be a stronger effect (= longer delays, more chance of failure) than stores to/from cloud storage in the same data center. ]
@eiriktsarpalis - FWIW I gather from #117 (and previous discussions) that the cloud0
/local
row in the table above should not have an x
in "Serialized semantics".
This would mean that the only real difference between cloud0
/local
and async
is that cloud0
/local
carries an extra computation-local data map - do I recall that correctly?
To put it another way, if you could edit the table above until it reflects what's accurate that would be great :)
Copied from #117:
@dsyme says:
I see, so the entire continuation is serialized, so
cloud {
let v = ref 0 // or some non-serializable thing
let! x = SomethingThatMayUseCloudParallel()
... reference to v ...
}
results in the possibility of invalid serialization of the continuation.
In this setting I still like the cloud0
name - explained as "a cloud { ... }
that starts precisely zero nested tasks, and no part of which gets serialized, and which can consequently use shared memory and unserializable in-memory objects safely"
@palladin says
I was wondering what that cloud0 suffix zero means...
" a cloud { ... } that starts precisely zero nested tasks, and no part of which gets serialized, and which can consequently use shared memory and unserializable in-memory objects safely" it is exactly the motivation behind local {}
(I've edited the table above to reflect my understanding)
I think that "Cloud I/O + cancellation + multi machine" implies "Scheduling operations may serialize (dangerous to use shared mutable memory and unserializable objects)"
columns consolidated
Slightly related, the master branch of MBrace.Core includes support for local execution that emulates the effects of distribution. See this example.
I took another less dramatic attempt to make progress on this issue here: https://github.com/mbraceproject/MBrace.Core/pull/119
This tries to replace the language "Remote" by "Cloud" and "Locally" by "Client". Minimally I think we have to be really careful not to use "Local" to mean "Client"
Given today's discussion, can this issue be closed as well?
Yep, closing.
Drawing on the discussion started in this issue, I would like to share a few thoughts on the programming model.
As you may know, cloud workflows are used in every aspect of the MBrace API, from parallel combinators to store operations. For instance, the
ICloudDisposable
interface has the following signature:An interesting question that arises here is, how one can know if a dispose implementation does not introduce distribution? While it makes sense that all primitive store operations should not introduce distribution, this cannot be guaranteed by their type signature. A workflow of type
Cloud<unit>
could either signify asynchronous store operation or it could contain a massively distributed computation. In other words, there is no way to statically detect if a workflow carries the distribution effect.Currently, this is somewhat mitigated using the following primitive:
This has the effect of evaluating the input workflow with thread pool parallelism semantics, thus giving a dynamic guarantee that the nested computation will never exit the current worker machine. It offers relative sanity, but is hard to reason about and does not work correctly in conjunction with forking operations, like
Cloud.StartChild
.My proposal to amending this issue is to introduce two brands of computation expressions for MBrace workflows, for local and distributed computations. A distributed workflow can compose local workflows, but not the other way around. Store operations will be local workflows and the parallelism primitives will necessary return distributed workflows. This would allow to statically reason about the distribution effect, while potentially complicating the programming model.
I have created an experimental branch that attempts to develop these ideas: Workflow definitions Builder declarations Store operations using local workflows Cloud.Parallel primitive
Thoughts?