Open anfeng opened 11 years ago
The issue with DRPC is that the clients and the topologies themselves need a way to get to the servers after the DRPC servers are launched on. YARN does not have any service registration/virtual networking system yet. We could build a simple registration system using zk, but that would require changes on the DRPC clients as well.
I don;t understand why there requires changes to DRPC Client.
Clarification?
Ya that seems totally wrong. Is that on the wiki or something so I can correct it?
DRPC is not really a part of storm on YARN yet because DRPC needs to be in a place that external services can easily get to. YARN does not have a service registry of any kind yet that would allow external DRPC clients to find the servers. So the correct way to use DRPC with storm on YARN is to have a number of DRPC servers already launched. When you launch a storm cluster you would include the addresses of these external DRPC servers in the config, you may also need to include them in the config when you launch a topology. The DRPC spouts reach out and connect to the servers to pull data down so they need to know where to go to get the data.
The thing to be aware of here is that DRPC servers are not designed to be shared between several different storm clusters. It should not be a problem because they are essentially stateless. You just have to be careful that each topology has a unique function name. You have to be sure of that now, but that is only within a single cluster, if you are using shared DRPC servers you have to be sure of that across all clusters.
--Bobby
From: Sean Zhong notifications@github.com<mailto:notifications@github.com> Reply-To: yahoo/storm-yarn reply@reply.github.com<mailto:reply@reply.github.com> Date: Monday, September 9, 2013 9:06 AM To: yahoo/storm-yarn storm-yarn@noreply.github.com<mailto:storm-yarn@noreply.github.com> Cc: "Yahoo! Inc." evans@yahoo-inc.com<mailto:evans@yahoo-inc.com> Subject: Re: [storm-yarn] DRPC servers on YARN (#8)
I don;t understand why there requires changes to DRPC Client.
Clarification?
— Reply to this email directly or view it on GitHubhttps://github.com/yahoo/storm-yarn/issues/8#issuecomment-24077557.
The thing to be aware of here is that DRPC servers are \ not designed to be shared between several different storm clusters **.
DRPC Server is stateful, there is a map container in the server. Use unique function name across all clusters are too much assumption. So DRPC server better located inside each storm cluster?
DRPC is not really a part of storm on YARN yet because DRPC needs to be in a place that \ external services can easily get to **
Maybe we can design another layer of gateway Server which will bridge the request to DRPC Server and reply response? The gateway server is visiable from outside, and can be shared by all clusters.
In this case:
Maybe this is a much cleaner approach.
That does sound interesting, but I would like to see how things play out in YARN too. There has been some discussion about a service registry on the long lived applications JIRA YARN-896. It would not be too difficult to have a way to discover where the DRPC servers are located for a given storm cluster. It is mostly a matter of getting something like that into YARN, and then updating the clients to use it to find the DRPC servers.
Alternatively we could update the storm on YARN App Master to play that role for the time being. It could provide an API that could be queried to see where the DRPC servers are located. Then the client could cache this information and refresh it periodically or when an error occurs. This might be a good intermediate step as the YARN work may take a while.
But DRPC server are not supposed to be shared by different cluster, as there are states in DRPC server. How do handle this with single DRPC server? If there are multiple DRPC server in the YARN cluster, then maybe a another layer of broker is needed to manage them, otherwise the client code need to connect to multiple DRPC server directly.
Or you can modify the existing Storm DRPC server code, so that a single DRPC server can manage different storm cluster.
Currently DRPC server works as follows:
On Wed, Sep 25, 2013 at 4:25 AM, Robert (Bobby) Evans < notifications@github.com> wrote:
That does sound interesting, but I would like to see how things play out in YARN too. There has been some discussion about a service registry on the long lived applications JIRA YARN-896. It would not be too difficult to have a way to discover where the DRPC servers are located for a given storm cluster. It is mostly a matter of getting something like that into YARN, and then updating the clients to use it to find the DRPC servers.
Alternatively we could update the storm on YARN App Master to play that role for the time being. It could provide an API that could be queried to see where the DRPC servers are located. Then the client could cache this information and refresh it periodically or when an error occurs. This might be a good intermediate step as the YARN work may take a while.
— Reply to this email directly or view it on GitHubhttps://github.com/yahoo/storm-yarn/issues/8#issuecomment-25039300 .
Currently, Storm-YARN launch Nimbus, UI and Supervisor servers. We should enable DRPC servers to be launched if requested by community.