ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
32.88k stars 5.57k forks source link

Resolving inconsistencies in the API. #525

Closed robertnishihara closed 7 years ago

robertnishihara commented 7 years ago

Right now, there are several inconsistencies in the API. Some of the following may be fine.

  1. We change the way that we call remote functions (versus regular functions), that is f.remote() versus f(). But we do not change the way we instantiate actors (versus regular classes), both are just Class().

  2. We change the way we call remote functions (versus regular functions), that is f.remote() versus f(), but we do not change the way we call actor methods (versus regular class methods). Both are c.method().

  3. The word remote is an adjective, but actor is a noun.

There are a handful of options.

A. (addresses 1 and 3) Replace @ray.remote and @ray.actor with @ray.task and @ray.actor respectively. Call remote functions with f.remote(). Create actors with Class.remote().

B. (addresses 1 and 3) Replace both @ray.remote and @ray.actor with @ray.remote. Call remote functions with f.remote(). Create actors with Class.remote().

C. (addresses 1 and 3) Replace both @ray.remote and @ray.actor with @ray.remote. Call remote functions with f.task(). Create actors with Class.actor().

Note that none of these address problem (2). The issue with (2) is that calling c.method.remote() or something like that is kind of verbose. There are probably good options that I'm not thinking of right now.

cc @jacobandreas @pcmoritz @istoica

istoica commented 7 years ago

How about asking people around about their preferences?

Any of the choices are far preferable to what we have today, though I slightly prefer A and B over C.

Also, I'd seriously consider adding "remote" to actor method invocation. Yes, it's a bit long but its consistent with what we are telling people about remote functions, i.e., you need to write f.remote() instead of f() to explicitly signal that a remote function call returns an object id, instead of just an object. BTW, that's one reason I prefer A and B over C:, it's easier to add a suffix (i.e., remote) to method invocations.

pcmoritz commented 7 years ago

I'm also in favor of A or B over C (f.task() feels unnatural). I like B because of its uniformity and A because it does emphasize that tasks and actors are pretty different concepts.

jacobandreas commented 7 years ago

(A) is my favorite for same reasons as Philipp. There's also another option:

D. Use @ray.task and @ray.actor. Call remote functions with f(). Create actors with Class().

(i.e. totally removing the special remote-call syntax). This is even more uniform, since remote task and actor calls look the same, but not sure how important you think it is to visually distinguish the call that starts the remote worker.

On Mon, May 8, 2017 at 12:16 AM Philipp Moritz notifications@github.com wrote:

I'm also in favor of A or B over C (f.task() feels unnatural). I like B because of its uniformity and A because it does emphasize that tasks and actors are pretty different concepts.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/525#issuecomment-299791335, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOaT8eYcpA0orXj94cJuNpbc_P_pRWsks5r3sFfgaJpZM4NTVBS .

istoica commented 7 years ago

I'm fine with both A and D. Note that D will also address (2) implicitly. So why not do (D)?

robertnishihara commented 7 years ago

(D) might be good. It certainly resolves problem (2).

Possible disadvantages of (D):

  1. Suppose f is a remote function and we want to add in the ability to call f locally in a blocking manner. Then it isn't clear how to do this if we do (D). That said, there are easy workarounds, since the user can just define f without the decorator, and then define f_remote = ray.remote(f), so they could call f for the blocking version and f_remote for the non-blocking version.

  2. Currently, if a user is parallelizing an existing code base, and they decorate a function f with @ray.remote, then all existing invocations of f() will automatically raise exceptions, so the user will immediately be made aware of which parts of the code they need to change. This is pretty useful I think. I suppose this is also an argument for changing actor method invocations.

  3. Invoking functions with .remote potentially makes the code more readable.

Note that there is a somewhat fundamental asymmetry between remote functions and actor methods. That is, remote functions could in principle be executed remotely or locally in the same process. However, actor methods can only be executed remotely on the worker process that actually contains the actor state.

istoica commented 7 years ago

I chatted more with Philipp today, and here are my preferences:

(a) Either A or B + adding remote to methods, i.e., class.method.remote(). I know this is verbose but it is clear and consistent. If we tell people that we use f.remote() because we want to make them aware that a remote function returns an object_id, then we should do the same for methods. Otherwise, we might create confusion.

(b) D, again because it solves (2), though I agree with Robert's concerns, hence its my 2nd preference.

(c) Either A or B; this is last because it doesn't solve (2)

All of the above are much better than what we have today, so one thing I want to make sure is that we pick one and go for it before the release. We do not want to get into one of these cases where the "perfect is the enemy of good" (https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good)

robertnishihara commented 7 years ago

Ok, maybe solution (A) + adding remote to actor method calls is the way to go. I'll try converting some of the applications and see how it feels.

Note, I like (A) more than (B) because we find it very useful when describing the system to use the word "actors", and (A) clearly links the concept of actors to the API.

istoica commented 7 years ago

A + a.method.remote, sounds good to me.

BTW, note that adding remote for methods has another advantage. It allows one to call local methods in the same actor for better code structuring (like private methods in classic OO languages). That is, if a method call doesn't have remote, the call is local, in the same actor instance. Philipp mentioned that this is one thing some people asked during the minicamp, which makes sense.

robertnishihara commented 7 years ago

That actually works already. If I understand you correctly, you can do the following.

@ray.actor
class Foo(object):
  def private_helper(self):
    pass

  def method(self):
    # Call helper method.
    self.private_helper()

The reason this makes sense is that there is never any ambiguity about where an actor method call runs. It always runs on the actor worker, so there is no option of running locally versus remotely.

robertnishihara commented 7 years ago

A + a.method.remote is implemented in #541.

robertnishihara commented 7 years ago

One last proposal. How about the following.

Remote functions

Actors

I really like the word "actor", so I'm not thrilled to get rid of it, but class is more analogous to function.

pcmoritz commented 7 years ago

+1

istoica commented 7 years ago

I'm fine with it.

I think this is probably the most consistent proposal since these are indeed functions and classes. The only thing is that it doesn't expose the key concepts of ray: tasks and actors, but probably it's ok. We can say:

The one thing to think about is whether this will generalize naturally to other bindings which we might have in the future, e.g., Java, Scala, C++, etc. I think it does, but haven't thought at all...

atumanov commented 7 years ago

I love the move to function and class. Getting rid of actor is actually a good thing, IMHO. The notion of a class was actually invented to encapsulate state -- it was the original OO motivation for it, so it's very fitting to call it a class semantically.

I also strongly support the unification of the calling interface for functions and methods with the remote suffix. The only tweak I would consider is getting rid of the remote suffix when instantiating the Ray Class. That way, all calls to remote can participate in the hybrid dataflow, return an object ID, and are easily identifiable in the code as places that trigger task submission to Ray. I think it's important that we have this futures composition marker well defined.

robertnishihara commented 7 years ago

Ok, it looks like @ray.class may not actually be implementable due to the fact that class is a special keyword.

robertnishihara commented 7 years ago

Still experimenting. I changed #541 to use @ray.remote everywhere (both for declaring remote functions and actor classes).

atumanov commented 7 years ago

Just to carry over the most important bit I care about from slack discussion, I think it's very important to preserve the .remote() <=> future invariant. It means that .remote() is exactly defined as returning a future, no more, no less. In other words, all futures that can be returned, should be returned through remote() and remote() can only return a future. If we fail to preserve this property, I foresee confusion developers might have. We don't want them to mentally differentiate between .remote() being used for functions and methods and .remote() being used for ray.class instantiation. It is very convenient to know that I can always call ray.get and ray.wait on the result of any call to .remote(). Failing that breaks the composability of our API. For those who are familiar with basic principles of PL, I'm talking about the orthogonality of our API here: https://en.wikipedia.org/wiki/Orthogonality_(programming)

istoica commented 7 years ago

Alexey's, I agree with your points, but let's see how people will respond to replacing everything with remote. I think that having remote everywhere will make a bit clearer to people that remote is not about the futures only. Also, if we stick with remote let's write everywhere in the documentation that remote means that the functions/actors are executed/instantiated on a different process which might be local or remote, and not necessary on a remote node, which I believe is the case now.