Open yungParrot opened 1 year ago
@yungParrot Added!! We would really love to see the OpenTelemetry integration, you can also join our biweekly working group meeting to discuss this.
@yuzisun when and where are those meeting held?
I'll need to know what metrics/traces you'll want to measure/track. Adding something like that for example:
from opentelemetry import trace
class KServeClient(object):
def __init__(self, config_file=None, context=None, # pylint: disable=too-many-arguments
client_configuration=None, persist_config=True):
"""
KServe client constructor
:param config_file: kubeconfig file, defaults to ~/.kube/config
:param context: kubernetes context
:param client_configuration: kubernetes configuration object
:param persist_config:
"""
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("KServeClient") as span:
span.set_attribute("created object", str(self))
...could be used then by the user through:
from kserve import KServeClient
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
resource = Resource(attributes={
SERVICE_NAME: "your-service-name"
})
provider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(ConsoleSpanExporter())
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
kserve_client = KServeClient(config_file='./kserve/test/kubeconfig')
...and will result in some output like that:
{
"name": "KServeClient",
"context": {
"trace_id": "0xdd22c93c595f9c3d8087d46c416d48cc",
"span_id": "0x6dd718d4f7f56a99",
"trace_state": "[]"
},
"kind": "SpanKind.INTERNAL",
"parent_id": null,
"start_time": "2023-02-05T18:12:20.175541Z",
"end_time": "2023-02-05T18:12:20.175585Z",
"status": {
"status_code": "UNSET"
},
"attributes": {
"created": "<kserve.api.kserve_client.KServeClient object at 0x7efdee1dfa30>"
},
"events": [],
"links": [],
"resource": {
"attributes": {
"service.name": "your-service-name"
},
"schema_url": ""
}
}
The OpenTelemetry exporters (Jaeger/Zipkin/OTLP Collector) would be then configured by the user. What do you think about it? :cowboy_hat_face:
when and where are those meeting held?
nvm - I've found the meeting calendar :point_left:
@yungParrot Are you still working on this? I think your idea sounds good and the exporter should be configured by the user like the way Knative works, just note that KServeClient
is the Kubernetes client not the http client and I think we would want to trace the http request like from transformer to predictor or add the span for preprocess, predict and postprocess.
Are you still working on this?
@yuzisun yes, but I didn't have much time last month. I wanted to create a Design Doc to describe what I'm planning to do 🤔 I'll update you once I have something to show you ðŸ¤
@yungParrot sounds great! Looking forward to the design doc
@yuzisun my design doc is available here - please let me know if there is anything I should improve 🤔
@yuzisun my design doc is available here - please let me know if there is anything I should improve 🤔
@yungParrot thanks ! do you want to present on the kserve community meeting next Wednesday?
@yungParrot thanks ! do you want to present on the kserve community meeting next Wednesday?
@yuzisun I'm not sure if I will be able to attend that meeting
@yungParrot We are planning this feature for KServe 0.12, are you still interested in working on this?
@yuzisun yes
Hey @yungParrot have you started working on this?
@sivanantha321 yes, although I've run into some problems and I'm not sure about the next steps. I understand how the Python code can be instrumented through OpenTelemetry and so on, but I'm not sure how to create e2e tests for that and how OpenTelemetry should be configured on the cluster itself - I see that there are multiple ways of installing OpenTelemetry operators/collectors etc. on a k8s cluster and I'm not sure exactly how to approach this problem.
@yungParrot I can help you with e2e tests. We can connect on slack
/assign @andyi2it
/kind feature
This issue was specified in the 2023 Roadmap. I wanted to add OpenTelemetry support to the KServe Python SDK so it could be integrated with tools like Jaeger or Zipkin.
I wanted to work on this issue - could you add it please to the KServe 0.11 board?
:cowboy_hat_face: