tensorflow / serving

A flexible, high-performance serving system for machine learning models
https://www.tensorflow.org/serving
Apache License 2.0
6.19k stars 2.19k forks source link

How about let serving support thrift? #588

Closed weberxie closed 5 years ago

weberxie commented 7 years ago

Within our company, more and more demands on inference using TensorFlow serving, there have a suite of tools about service management bind with thrift, but now TensorFlow serving only support gRPC, in spite of gRPC is a excellent RPC framework, we have to try let serving support thrift.

So, what's the community's attitude on this field? I'm very much expected to get feedback.

mountaintom commented 7 years ago

HI,

That’s an interesting thought.

Protocol Buffers and Thrift both contain a data serialization layer, so dropping in Thrift for gRPC would be more like dropping in Thrift for Protocol Buffers.

Supporting Thrift could be a significant effort, at the level of creating a new Protocol Buffers API for another language, plus the possible “impedance” match between the protocols. It is just my personal speculation that the resources needed may make doing this less than a sure thing.

It looks like someone has started down that road, and created a proxy to translate between gRPC and Thrift. So it is possible. https://github.com/DecipherNow/proto-thrift

I’ve been considering creating something along the lines of an Apache “Mod TensorflowServing”, that could serve both JSON and Protocol Buffers native binary serialization over regular HTTP. Directly, not translating through a proxy. I’m curious how much interest there would be in such a project?

weberxie commented 7 years ago

Hi @mountaintom, thanks for your reply.

The project of https://github.com/DecipherNow/proto-thrift looks like very useful that I can learn from it.

I’ve been considering creating something along the lines of an Apache “Mod TensorflowServing”, that could serve both JSON and Protocol Buffers native binary serialization over regular HTTP. Directly, not translating through a proxy. I’m curious how much interest there would be in such a project?

Does this means that you want to maintain a new branch? I'm worried about this may cause maintenance problems, such as it difficult to compatible with the original branch. Backing to the start point, does your proposal really solve the problem about different RPC protocol? And does it cause performance problem? Of course, If I can, I'm very interested in contributing to the community.

Is it a meaningful choice for support Thrift directly?

Hi @kirilg, I'm very looking forward to your advice on this issue.

mountaintom commented 7 years ago

Hi,

Those are some very good questions.

This would be a separate project that includes TensorFlow Serving as a third party repository. Much like a project built around an Oracle database, without being part of the Oracle database itself.

Your original question came up on Github just as I was beginning my thought process on this, so it caused me to propose this at an early stage.

And just to be sure, I don’t speak for Google. I’m an independent developer, not working for them.

Here is my overview.

For the core of TensorFlow Serving applications, gRPC makes sense. It has scalability baked in with built in load balancing and various HTTP handshake improvements. Most early AI adopters have the resources to implement custom technology.

However, at the edge, where most companies don’t have the same infrastructure as Google, maintaining another protocol (gRPC) can be a issue. These are the people I see here asking about alternatives to gRPC. At least that is my interpretation.

As I think about this, I see a related issue. As long as the data is within the TensorFlow application walls, passing tensors around makes sense. However, this can be viewed as this big multidimensionally pronged power cord. Maybe, when an enterprise wants to plug a TensorFlow AI machine into their circuits, they just want a standard two of three phase cord to plug in.

So, a convertor cable (sorry for the continued electrical wiring analogy) to leave the tensors behind and present what is digestible at the endpoints, Images, arrays of numbers, words and such, is wise.

The reason this overall plan seems like a valid approach is, JSON (especially sans tensors) is easily consumable by most company’s infrastructure. At the endpoints, the increased overhead of the conventional server and protocol is not as big of a hit; especially as many AI models are already high latency. Conventional load balancing would handle scaling.

Google uses Protocol Buffers as their defacto interface, not just for over the wire connections. So, Protocol Buffers are core to TensorFlow Serving. Protocol Buffers can produce both the native binary serialization and JSON, so using these is a natural fit.

Simplistically this is what is already obvious, Protocol Buffers is not tied to any transport such as gRPC.

I’m just seeing this as more of a packaging issue and changing the viewpoint for DevOps and such that the server (Apache, nginx) is the center point and TensorFlow Serving is an installable module that conforms to the server’s standards (something that group would be familiar with). Providing a transport that easily meshes with other systems and the additional plugin layer to normalize the data, between the groups that know their tensors and those who want data in a format they can directly use.

Back to your question about directly supporting Thrift.

I would image Thrift would have better performance than what I’m trying to do.

Each of our plans involve modifying the model server. Each has a set of valid use cases.

Idealiy thrift support would be supported as a part of the TensorFlow Serving project, for the maintainability issues you point out.

In situations where Thrift is already established in the infrastructure, especially where there is a need for external systems to touch the data in more places as it flows from model to model, the Thrift interface is the best fit.

For easier integration at the endpoints, the HTTP server plugin is the best fit.

It would be nice to collect some input about how people might find each of these approaches useful.

Maybe the good folks at Google will consider implementing a Thrift interface.

weberxie commented 7 years ago

Hi @mountaintom ,

I think I have understood what you mean, let me make a short summary:

  1. gRPC can't adapt to different companies of different infrastructure
  2. Serving is not a isolated service, it should communicate with existing systems
  3. Thrift is necessary, but HTTP more easier

From my standpoint, support a HTTP server plugin is really a good choice, But there may be a serious problems here, that's compatibility, User must change code from RPC call to HTTP call. So, if community will consider support Thrift must be a great thing, If not, support HTTP is a compromise.

So, do you have any plans on support HTTP server plugin? Whether we should listen to Google's advice on this project?

Another question, It's said that over 800 projects within Google using TensorFlow serving, I'm curious about how to manage these servers and isolate hardware resource?

mountaintom commented 7 years ago

Hi @weberxie,

Again these are interesting questions you ask. An interesting challenge, as these questions are forcing me to think further into this subject at a faster rate than I was planing to tackle these subjects.

When I start building my project, solving some actual coding issues and having done a deeper analysis of the parts of TensorFlow Serving I’ll be working with, I will have more hard evidence based reasoning to work from.

This thread can be treated as a brainstorming session, but not wild speculation, some projection is reasonable, yet, I want to be careful with my answers. There are many nuances around these issues. To avoid becoming lost in the nuances, I’ll try to pick good central cores to start from. Then nuances can branch out from those cores.

With my proposed project, for example, the main theme is bridging the Google-non-Google infrastructure barrier. I could change my implementation, once I have developed more hard facts. But, it will most likely have the same basic shape from the outside.

Enough of the caveats and back the subject.

1). gRPC can't adapt to different companies of different infrastructure.

gRPC is a transport much like HTTP is. Thrift is more like Protocol Buffers, it defines an API. Thrift steps into the transport arena as well.

Most of the API interfaces that are used to connect the sub-sections of TensorFlow Serving are defined in terms of Protocol Buffers, splicing on another foreign API may require a significant effort.

(But, as we speak, there are really smart people out there, and someone could have an inspired solution or a product of simple hard work solution that could be put to use.)

2). Serving is not a isolated service, it should communicate with existing systems.

Yes that is true. gRPC fits the need of TensorFlow Serving well. If you have an infrastructure based on Thrift, I see why you want there to be compatibility with it.

3). Thrift is necessary, but HTTP more easier

HTTP is easier because it only swaps out the transport layer— gRPC is swapped out for HTTP. Thrift requires splicing onto the lower Protocol Buffers API level.

If your use case fits my “at the edge” use case, the project I’m planing may fit your need. But, I’m not working full time on it. So, don’t hold up your immediate needs.

The TensorFlow Serving architecture is modular, but it looks like swapping gRPC for Thrift or HTTP requires some re-writing of the Model Server.

Creating your own in-house fork of the TensorFlow Serving code, making the minimal possible changes to it, and accepting the need to maintain it is a possibility.

I’ve built several TensorFlow Serving clients (C++, Objective-C, and possibly the one and only Perl client) that live separate from the main TensorFlow Serving code base. I projected the possible level of continuing maintenance on the premises that breaking changes to the gRPC/Protocol Buffers interfaces will be few and far between.

However, a model server fork may need updates often.

If Google were to support this in the main code base, I’m speculating that code to address the following issue should be part of the solution.

Unless a Protocol Buffer complier plugin, handing the equivalent of the gRPC plugin, were to be built for Thrift (maybe the project I linked to would provide this) or for HTTP, these additional protocols/transports could be a maintenance headache. A Protocol Buffer API change would require a corresponding update to be made by hand.

4). It's said that over 800 projects within Google using TensorFlow serving, I'm curious about how to manage these servers and isolate hardware resource?

Many people this forum are developing their models and applications on a single or just a few machines and have a similar question.

There are in-between cases, but most AI applications require lots of resources. So, each model in an application may require a cluster of machines to support the load. At that level, there is no need to schedule unrelated models on a server. This actually makes things easier.

On a top level, I imagine many of the applications talk to each other in rather intricate ways.

For managing Google in the large, I don’t know what they use. A supersize Kubernetes seems to be a part of it.

** Another possibility to consider.

TensorFlow Serving is cool in the way it lets you start using a system that is capable of high scalability, yet at the start, be totally unbothered by the complexity of building such a system. You can kind of grow into it.

However, it is not the only way to deploy TensorFlow models into production.

It is a little more brute force, plunking in the whole TensorFlow application into a cluster, but the TensorFlow Spark community is becoming popular.

I still bleed a little purple, so I’ll offer this project link. https://github.com/yahoo/TensorFlowOnSpark

Spark is well instrumented for production monitoring and may offer more protocol choices. But, the complexity of scaling will be visible to you.

5). Whether we should listen to Google's advice on this project?

They are certainly in a better position than me to weigh the pros and cons of supporting additional protocols.

yjhjstz commented 6 years ago

we have done this, looking forward to opensource.

peddybeats commented 5 years ago

Closing as it seems the issue is resolved.