ml-hongkong / hongkong_flowers

An APP for identify flowers in Hong Kong with Deep Learning technology
31 stars 6 forks source link

Infrastructure for serving the trained model with RESTful API #2

Open indiejoseph opened 6 years ago

indiejoseph commented 6 years ago

Infrastructure for serving the trained model with RESTful API.

1.) API gateway for APP to upload a image 2.) pass the image to get prediction out of the trained model 3.) Infrastructure for serving the trained model, AWS / Google Cloud?

jamescheuk91 commented 6 years ago

noted. Requirements Drafts:

Google Cloud Storage to store the uploaded images Serving trained model via Google Cloud ML Flask as Front-end Webserver and API

any missing user case?

indiejoseph commented 6 years ago

@jccf091 ok can you try to build a simple server to serve a Keras ImageNet model first, just API is ok.

jackhftang commented 6 years ago

I am making a laravel server that allow public user (has ready designed to allow user session, but not ready) to submit images and connect to worker machine. The public api and the api between ml-server and the demo page are already done. But I have not yet write the ml-server. see https://flower.jackhftang.com I will later upload the source code.

indiejoseph commented 6 years ago

@jackhftang Laravel server can be API gateway, but i dunno how can it work with ml-server, can you give me more detail? and how about the infrastructure? Thanks

jackhftang commented 6 years ago

@indiejoseph in short, each image has fields user_id, image_id, job_id, image_url, status, model and result (with subtle created_at, updated_at). For users, they can only see image_id and image_url, status, model and result. As for ml-server, they can only see job_id and image_url.

ml-servers need to pre-register to api-server with an unique name and an endpoint.

Once a user upload an image (together with an optional model of choice) , the image will be stored in api-server, and the api-server will send a request to ml-server according model. If no model specified, it will randomly choose one.

ml-servers listen to the endpoint and with get a message containing job_id and image_url. The ml-servers can choose to reject or accept the job. And then, when the job is finished, it reply to api-server with job_id and the result. If ml-server choose to reject the job, api-server may choose another model or retry this ml-server later according model.

An image has three status, namely, pending, processing, done. Initially, an image is in pending status. And change to processing, once ml-server reply an 'accept'. And be done, after ml-server post back a result.

Auth can build upon, it is even easier to implement than to integrate with third party services. Currently, job_id and image_id are 60 char alpha-num string and image_url is 41 char alpha-num. And these three field are independent (in term of probability) and no one except api-server know all three fields for any image at once. IMO, the privacy of current setting is strong. Currently, all images belong to a virtual public user.

Other minor things, all uploaded image is currently resize to width 300px.

Detailed message formats and routes will be available after the prototype is done. It should be tonight or tmr night.

jackhftang commented 6 years ago

btw, for production use, I am looking for sponsors of domain and server. And I am happy to make a transfer. I am fine if eventually the app do not use this api-server.

indiejoseph commented 6 years ago

@jccf091 what do you think?

indiejoseph commented 6 years ago

@jackhftang @jccf091 Let me draw a diagram to describe the whole picture from APP to backend and authentication. stay tuned

jackhftang commented 6 years ago

forget to say, user hold the image_id and poll for classification result.

jamescheuk91 commented 6 years ago

@indiejoseph @jackhftang What described is too complex.

I don't think we need a job queue pattern to handle prediction for user submitted flower image.

To me, the easiest way to do is just to have an only API server to handle the incoming requests. The user should able to upload the lower image and get the prediction with in one or two requests. I don't expect running trained model against uploaded image will longer than 10 seconds.

Personally, I don't like long pulling. It introduces a lot of issues. And it should be used in some real-time features.

The reason why I suggest using "Flask" is to reduce overall complexity since we are going to write Python anyway.

If we need job queue pattern and need to push result to the front end, I would suggest using Phoenix Framework.

jamescheuk91 commented 6 years ago

@indiejoseph @jackhftang Why do we need to explore "model" knowledge to the front end? One trained model to handle all image classification tasks sounds good to me. We should serve different model version in google cloud or somewhere, so that we can roll back to different version easily.

jackhftang commented 6 years ago

The reason why to separate a ml-server is

  1. It is very likely more than one models. Models will evolve and probably you want to try new ideas from time to time. If you not just regarding the app and considering the processing of development, I guess you will want the ability to select a model and submit an image and see the result afterward or compare result with others.

  2. hosting the api-server is easy, but ml-server could be memory-hungry or worst if the model can only run on gpu, you may not want to spend on long-living gpu instance for little request. The architecture allow the ml-server to be offline, or reply in much later time.

btw, it is not long polling.

jamescheuk91 commented 6 years ago

forget to say, user hold the image_id and poll for classification result.

@jackhftang not long polling?

jamescheuk91 commented 6 years ago

@jackhftang We can also choose to host the model to Google Cloud ML.

jackhftang commented 6 years ago

@jccf091 not long polling, the current design is to GET /api/v0/result/, currently the web frontend send get request every 2 second, you can open console and see on https://flower.jackhftang.com I call this kind of repetitive GET as polling, long polling on the other hand do not immediately reply. I don't use websocket, because it is easy to implement polling on php and heartbeat mechanism of websocket is the same (though its header is lighter). And also, I think this way is easier move to mobile app.

You may have a look of mock ml-server here. I just have not yet integrate it with the trained resnet50 model. The web frontend is able to handle displaying result already. https://gist.github.com/jackhftang/2bb2bde6f601362a970c73cc7072f3ec

And I have no experience with Google Cloud ML.

indiejoseph commented 6 years ago

Exploring ML Engine on Google Cloud https://medium.com/google-cloud/keras-inception-v3-on-google-compute-engine-a54918b0058

indiejoseph commented 6 years ago

jackhftang commented 6 years ago

I have made minimum features, upload file and classify. Currently, images are resized and store in api server. Information of what image user own is stored in localStorage. An admin panel is provided by laravel + voyager, which provide basic media/file explorer and graphic database viewer/editor. The api server is behind cloudflare, which provide cdn for all images. Real time update is currently done by polling, and it is just one simple mysql key lookup. I expect commodity computer can handle 10k query/second.

As for the ml-server, it is currently hosting on a somehow decent machine . It took around 10second to load libraries and model (using cpu as backend) and be ready to serve. And it took around 1.5GB of memory, and spawn 114 threads in total. I read some articles about Google ML engine that its real-time api can respond in a second...

Anyway, let's use Google Cloud, I still have $300 usd coupon not yet used and will expire =] I leave this project as my portfolio.

indiejoseph commented 6 years ago

Some request/response data format for reference

Client Request(Form)

{
  "lat": "float",
  "long": "float",
  "photo": "blob"
}

Server Response(JSON)

{
  "predictions": [
    {
       "name": "string // flower name",
       "probability": "float",
    }, {
      ...
    }
  ]
}
jacklam718 commented 6 years ago

Nice

jamescheuk91 commented 6 years ago

https://cloud.google.com/about/locations/ It seems like we can only deploy to TW.