haskell-distributed / distributed-process

Cloud Haskell core libraries
http://haskell-distributed.github.io
713 stars 97 forks source link

High Level Redesign - Discussion #336

Open hyperthunk opened 6 years ago

hyperthunk commented 6 years ago

Initial proposal is on the wiki here. I will be seeking feedback before getting into detailed design work.

hyperthunk commented 6 years ago

Folks - I've assigned haskell-distributed org members to this discussion. I know you're all rather busy people, so please excuse the interruption. I'd be delighted to hear your thoughts and feedback, if you've got time.

Thanks in advance.

hyperthunk commented 6 years ago

Yes, thank you @marcozocca-seal - I'm very keen indeed to hear from Adjoint and @tdietert !!

hyperthunk commented 5 years ago

I've posted some discussion to the wiki around separating actors and channels

qnikst commented 5 years ago

I really like the general concept and ideas. There are still many questions about how to implement concrete steps. I think it make sense on selecting the really core of the system and discuss everything in details, it seems separating challens and actors is a great example.

facundominguez commented 5 years ago

Discussion on how to write distributed apps and how to improve CH is needed, indeed. It is great to have you driving this!

hyperthunk commented 5 years ago

Thank you both!

There are still many questions about how to implement concrete steps

Yeah, definitely. I'm literally dreaming out loud in the initial post!

I think it make sense on selecting the really core of the system and discuss everything in details

Totally on board with that plan. As you pointed out, I've made a start with separating channels and actors, and will continue to break things down.

hyperthunk commented 5 years ago

I've posted some slightly more detailed ideas to the wiki. This is still broadly at the API design level, although I do dip a little into questions of how we monitor channels outside the context of an actor system.

hyperthunk commented 5 years ago

@qnikst - you were asking about implementation options/details...

Here are some thoughts about the actor subsystem (as independent from everything else), and message passing on the local node. I also dip into some stuff from akka that might be nice to implement, though not necessarily in the core.

Would appreciate some thoughts about the obvious things I've missed - like how this is going to screw our ordering guarantees in ways I've not thought of! :)

hyperthunk commented 5 years ago

A high level structural outline for the proposals is now available on the wiki here. Please feel free to comment and edit!

graninas commented 5 years ago

Hi, you might want to know we've built a framework that adresses many of the problems described in the proposal. I've posted a detailed article about it: Building network actors with Enecuum Node Framework

The framework has a potential to be a tagline or inspiration source for architecture and design decisions you're looking for.

I probably need to re-read the proposal and hightlight the most relevant things to make it more clear.

hyperthunk commented 5 years ago

Brilliant, thank you for highlighting this!

hyperthunk commented 5 years ago

@graninas - this does look very interesting. At which points in the various layers do you observe formal semantics, and how are they verified and enforced?

hyperthunk commented 5 years ago

Also @graninas, it looks like you have a single pipe protected by an MVar for serialising network traffic. Doesn't this get in the way of non-blocking socket I/O?

@qnikst, @facundominguez, @edsko, @duncan - how does this server and connection design compare to our approach in network-transport? It doesn't seem to answer the issues around blocking behaviour that we're trying to address here...

note: @graninas - I'm asking the other team members here about this, since my background in building high performance distributed systems is in Erlang and OCaml, plus a large slice of java/scala. So since my Haskell networking code knowledge isn't 100%, I'm just deferring to more knowledgable folks, rather than aiming any criticism.

And specifically, one of my requirements is to handle network traffic at least as efficiently as I can using netty/akka/etc, otherwise there's really no compelling reason to use Cloud Haskell over java/scala apart from I really wanna use Haskell, which isn't going to dent the enterprise market all that heavily...

hyperthunk commented 5 years ago

Hmmm - this point above gives me a hint about where to start, I think... Addressing the need to be able to write a non-blocking network server application that is at least as fast as an equivalent using akka (or OCaml, which has showcased some incredibly fast network layer applications and frameworks).

I wonder whether we should touch base with the warp team and try to understand more about how they're able to squeeze out every last drop of non-blocking behaviour from their networking code.

We should also figure out whether the -threaded issue with regards the network library has been resolved in GHC-8.x

hyperthunk commented 5 years ago

@graninas - I do like your structuring as eDSLs a lot. There's definitely a good deal of merit in using that as a blueprint for some of what we're doing.

Your framework does, however, suffer from a problem I'm trying to get Cloud Haskell out of, which is a very similar reason why transient hasn't been more successful imho. We are all trying to boil the ocean!

My proposal here - on the wiki pages I'm adding and so on - is to break all of these architectural layers up so they have absolute utility in and of themselves. If we start trying to re-design or re-build a layer and find that there's something better out there already, we should consider discarding it, or creating an API so that we can plug in whatever better thing we've found and let other people come along and implement their own if they disagree.

From an architectural perspective, whilst I fundamentally agree with the notion that we ought to be functional in our bent, I do not believe that throwing separation of concerns out of the window is a good idea. And ironically, there is a very clean separation of concerns in the Enecuum codebase - from the few minutes browsing I've done so far - but when I want to use it, I get everything + the kitchen sink.

A thing (be it a module, library, whatever), should do one thing and do it well. This gives each thing just one axis for change, which gives us focus areas for testing, and allows each thing to evolve to do its job as effectively as possible.

So the opacity you introduce using the free monad based approach is great, and hiding the fact that there are actors or whatever, is indeed a useful design goal. But I'm not trying to write the killer app that brings everyone to Haskell. I'm trying to build the fundamental parts that can used to demonstrate that we can build something as scalable and reliable as spark (or whatever distributed use case we decide to chew on), as proof that Haskell is a viable choice for the real world. And of course, in the process, help drive improvements upstream into the GHC runtime where feasible.

In that respect, I think we have rather different project goals, although our design goals do seem pretty closely aligned.

qnikst commented 5 years ago

@hyperthunk I'd rather not looking at the network layer at all as at thing point of time the most interesting thing is to have nice abstractions that allow us to build a maintainable system and supervision tries. Exact network transport should be kept separately the reason that in many cases you may want utterly different transport framework potentially on the top of the other standard libraries, and not homegrown Haskell one. The idea is that such an approach would allow communicating with the other languages more easily.

It doesn't seem to answer the issues around blocking behaviour that we're trying to address here.

Can you explain how does that happen as I'm failing to see? I see there quite a basic TCP server that can run workers atop of the heavy-weight connections, but I may miss something. We have problems of the blocking behaviour in case if we have issues on the lightweight connections (that I don't see in the framework above). Also, I don't see how blocking behaviour with existing node control will be fixed even if there will be a switch to another system.

hyperthunk commented 5 years ago

Can you explain how does that happen as I'm failing to see?

No, it doesn't. I was just misreading it - this code is essentially at the layer we have in network-transport-tcp, and ofc does need to prevent interleaving writes.

Exact network transport should be kept separately the reason that in many cases you may want utterly different transport framework potentially on the top of the other standard libraries, and not homegrown Haskell one.

Absolutely agree, and this is why Well-Typed factored it out the way they did. It would be possible to factor in another implementation into Enecuum too, matching the type class, if you so desired.

We have problems of the blocking behaviour in case if we have issues on the lightweight connections (that I don't see in the framework above)

Yes, all its traffic runs over the heavyweight connection, which is a design decision all of its own, with its own implications on inter node traffic and node behaviour too.

hyperthunk commented 5 years ago

@graninas - I really like the way you've tied the linguistic aspects to IoC.

I do think I want much lower level things to be available to me, such as actors, when that's the level of abstraction I'm interested in programming in. But I think a really good point I'm taking on board here is that finding the right level of abstraction and creating what is in effect, a ubiquitous language for it, is a highly effective approach (and this shouldn't be a surprise to any of us, viz Evans et al 2003).

I am definitely finding useful and thought provoking ideas in Enecuum. Indeed, along with Transient, it is providing a good deal of design inspiration, even when the outcome for me is a clarification that I am not necessarily going in the same direction.

Very happy you have shared this with us!

graninas commented 5 years ago

@hyperthunk Thank a lot for your thoughts about our code!

I'll answer your questions a bit later (tomorrow probably) if this is OK to you

graninas commented 5 years ago

@hyperthunk Let me answer to your questions.

At which points in the various layers do you observe formal semantics, and how are they verified and enforced?

I don't think we have any formal semantics in our project. To be honest, I'm not sure what is formal semantics in this context. Some my colleagues are talking about different approaches to process definition semantics, and if I understood correctly, a pi-calculi and session types is that kind of semantics. However, I'm not an expert in this question, at all. So in our project, we do not enforce that much safety through any kind of semantics. We've put some efforts to make things harder to break or use incorrectly, but one really can do something wrongly. In particular, it's possible to hang a node forever.

Also @graninas, it looks like you have a single pipe protected by an MVar for serialising network traffic

True story we have some underlying logic that is not perfectly optimised at the moment. In fact, we haven't had a chance to make optimisations. The goal is to create a working solution that should allow future improvements. There are at least several things to optimise: runtime structures, low networking bits, and the Free monad itself. Regarding network level, we're looking for a better approaches and more correct solutions. We're still learning the domain of networking, and before that we can't really do our networking better. Also, I was investigating performance of different Free monads both in Haskell and C++ languages, and it's possible to get additional performance by just switching the Free type to the Church-encoded free monad. There are other possibilities, but I need to have a viable metrics of the framework first. I'm really interested in metrics of performance. Hopefully I will be able to work in this direction soon.

@graninas - I do like your structuring as eDSLs a lot. There's definitely a good deal of merit in using that as a blueprint for some of what we're doing. Your framework does, however, suffer from a problem I'm trying to get Cloud Haskell out of, which is a very similar reason why transient hasn't been more successful imho

Thank you for your thoughts! I haven't heard about transient so far. Not sure what is the problem you are talking about here, could you please clarify or point me where I can read about it? I'm not sure why transient has failed in adoption. First look on it didn't say me anything about its key properties although there is a very detailed tutorial. Anyway, it would be nice to fix the similar problems in our framework (unless we're talking about the adoption problem that is I don't know how to fix).

But I'm not trying to write the killer app that brings everyone to Haskell. I'm trying to build the fundamental parts that can used to demonstrate that we can build something as scalable and reliable as spark In that respect, I think we have rather different project goals, although our design goals do seem pretty closely aligned.

Sounds good! I do think the Haskell ecosystem needs that kind of apps like killer apps, and I'm not sure if our framework is a viable candidate for this (although it would be wonderful). I was thinking a lot about what we haskellers need to do here, and maybe creating a spark-like applications is a good idea.

In our framework, we've made several decisions that allow us to develop our logic faster. We probably want it to be ready much sooner than the framework itself. It's possible to rewrite several things without affecting our business logic, so optimisations is the question of time.

it is providing a good deal of design inspiration, even when the outcome for me is a clarification that I am not necessarily going in the same direction.

That's good anyway because we probably want more different approaches to be researched. I'll be keeping my finger on the pulse of the Cloud Haskell redesign process.

Thank you!

hyperthunk commented 5 years ago

@graninas - thank you! That definitely is helpful background stuff.

Yes I'm very interested in performanve metrics too, though they are undeniably hard to nail down. As well as network layer optimizations, I'd be interested to look at the idea of church encoding.

And yes, the problem I was talking about is indeed the adoption problem. The really successful Haskell frameworks, wai, conduits/pipes, lens and all the awesome abstractions underneath it (profunctors etc), all have things in common. A tight formal or mathematical foundation, and a tight and focused scope. I'll write more later when I've had time to digest.

coolface88 commented 5 years ago

just my two cents opinion..quite not relevant but I like the concept about virtual actor of Orleans from Microsoft.