During create new revision of model Return http code 0

gawsoftpl commented 2 years ago

/kind bug

When I create new revision for model I received a lots http code 0 for high load (200 requests per second)

vegeta attack -duration=60s -timeout=5s -rate=200 --targets=targets.txt | vegeta report --type=text
^CRequests      [total, rate, throughput]         10619, 200.02, 28.99
Duration      [total, attack, wait]             58.09s, 53.09s, 5s
Latencies     [min, mean, 50, 90, 95, 99, max]  434.187ms, 4.617s, 5s, 5s, 5s, 5.001s, 5.009s
Bytes In      [total, mean]                     5427245, 511.09
Bytes Out     [total, mean]                     185349460, 17454.51
Success       [ratio]                           15.86%
Status Codes  [code:count]                      0:8935  200:1684

After install new revisions only 50% of pods received traffic for inference. Rest 50% has 0 traffic.

I dont know that this is a error for knative or istio?

When I delete all models, wait for delete all pods, create models from beginning. Everything works great.

Model architecture

Error is When I create new revision of model 1 or model 2. Transformer split traffic of grpc request only for part of new models version.

Error in istio:

2022-08-18T18:28:50.243968Z    info    ads    Push Status: {
    "pilot_vservice_dup_domain": {
        "ml-cookies-ensemble-predictor-default.default.svc.cluster.local:80": {
            "proxy": "ml-cookies-tabular-features-predictor-default-00003-deployg2h8j.default",
            "message": "duplicate domain from service: ml-cookies-ensemble-predictor-default.default.svc.cluster.local:80"
        },
        "ml-cookies-nn-predictor-default.default.svc.cluster.local:80": {
            "proxy": "ml-cookies-tabular-features-predictor-default-00003-deployg2h8j.default",
            "message": "duplicate domain from service: ml-cookies-nn-predictor-default.default.svc.cluster.local:80"
        },
        "ml-cookies-tabular-features-predictor-default.default.svc.cluster.local:80": {
            "proxy": "ml-cookies-tabular-features-predictor-default-00003-deployg2h8j.default",
            "message": "duplicate domain from service: ml-cookies-tabular-features-predictor-default.default.svc.cluster.local:80"
        }
    }
}
                                                                                       ││   }

Environment:

Istio Version: 1.14.1
Knative Version: 1.6
Kserve Version: 0.9
Kfdef:[k8s_istio/istio_dex/gcp_basic_auth/gcp_iap/aws/aws_cognito/ibm]: k8s_istio
Kubernetes version: (use kubectl version): K3S v1.24.2
OS (e.g. from /etc/os-release): Ubuntu 22

nader-ziada commented 2 years ago

do you see any error in the kserve logs? it hard to tell which layer could have an issue

gawsoftpl commented 2 years ago

Error resolved. Error was because during high load I run new revision and change 100% traffic to new revision immediately. During this process server reach bottleneck. When I change deployment to Canary with 10% step everything works fine.

nader-ziada commented 2 years ago

thanks for the update, will close the issue for now and please feel free to reopen if you see the issue again

/close

knative-prow[bot] commented 2 years ago

@nader-ziada: Closing this issue.

In response to [this](https://github.com/knative/serving/issues/13230#issuecomment-1222373253): >thanks for the update, will close the issue for now and please feel free to reopen if you see the issue again > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

knative / serving

During create new revision of model Return http code 0 #13230