microsoft / azureml-inference-server

The AzureML Inference Server is a python package that allows user to easily expose machine learning models as HTTP Endpoints. The server is included by default in AzureML's pre-built docker images for inference.
MIT License
22 stars 4 forks source link

Streaming not working on Azure ML Inference Server #53

Open rhassan91 opened 3 months ago

rhassan91 commented 3 months ago

Hi i deployed the following scoring script on a managed online endpoint on azure ml using v2 but it fails to return a streaming response. However running the Azure ML inference server locally does ensure I get a streaming response back. Isnt the local deployment meants to replicate the deployed version in behavior? Is Azure ML Inference server capable of streaming responses?

`

from azureml.contrib.services.aml_response import AMLResponse import time

def init():

pass

def run(request):

def my_python_tool(request):
    for word in request.split():
        if word != "100}":
            time.sleep(2)
            print(word)
            yield str.encode(word + " \n")
        elif word == "100}":
            print("last word")
            yield str.encode("100} \n")

print(request)
response = AMLResponse(my_python_tool(request), 200)
response.headers["Content-Type"] = "text/event-stream; charset=utf-8"
response.default_mimetype = "text/event-stream"
response.headers["X-Accel-Buffering"] = "no"
response.headers["Transfer-Encoding"] = "chunked"
response.headers["Cache-Control"] = "no-cache"
response.implicit_sequence_conversion = True
response.stream.response.cache_control.no_cache = True
response.stream.response.cache_control.no_store = True

return response

`

Yilmazzn commented 2 months ago

also have this problem, would love to know if streaming responses is somehow supported

Yilmazzn commented 1 month ago

solved it using custom containers

rhassan91 commented 2 weeks ago

solved it using custom containers

Would it be possible to help explain or share the base configurations of the custom container?