Open kacunningham413 opened 3 years ago
Flask may support this gzip out of the box, which is a generic sort of compression for the request. You can check the headers to see if gzip is already being used or not. If not, might want to investigate why.
This would be in addition to applying an image-format aware compression on the frontend before sending over the wire (using gzip)
Some investigation has shown that the vast majority of Time To First Byte (TTFB) comes from (1) time to send the HTTP request, and (2) model inference time. Both of these vary by request size. Here are two examples:
30MB file request; TTFB = 108s Request send time: 76s Upload to GCS: 1s Loading model: 7s Model inference: 24s Upload results to GCS: <1s
6MB file request; TTFB = 28s Request send time: 15.5s Upload to GCS: <1s Loading model: <1s Model inference: 11s Upload results to GCS: <1s
To speed up individual requests, we can potentially resize the files on the client side before upload. We need to confirm that decreasing image size will not degrade model performance (up to a certain point).
To speed up multiple-image requests, we should process multiple-images as multiple requests instead of as one large request...that would speed up a 10 image request by almost 10x.
Side note: It's not clear why time to load the model changes with files size (needs investigation).