Closed jon-chuang closed 7 months ago
TODO:
@logan-markewich After some thought, langchain's approach seemed a little off to me.
Rather than spawning and passing around entire new callback handlers and managers, the above ontology of session,run,task
and passing around session,run,task<->subtask
metadata, and having a centralized callback service seemed more sane to me. I will update as progress is made.
@jon-chuang I totally agree RE langchains approach. Centralizing a service for this, and creating proper sessions/runs/tasks to handle async/parallel tracing sounds great to me.
It seems like this would be a possible sequence of PRs to support this:
Is this a correct understanding of the planned changes?
Yes, that is approximately correct. However, 3 should come first in order to put pressure on the callback system as it is iteratively refactored so that we are aligned with the important use cases.
Hmm, while that is true, if 3
is implemented now, doesn't that mean it will have to be refactored later? Just trying to think of the most efficient approach really, as it seems like a decent amount of work overall š
unless you don't see the overall interface for the callback handlers changing much
I have the following plan:
Sure! At a high level, this sounds good to me šš
As always, let me know if there's anyway to help! Always happy to test and review PRs of course. šŖ
Have y'all looked into emitting OpenTelemetry traces for this? I'm working on defining semantic conventions for tracing data here, to be proposed to the OTel project soon: https://github.com/cartermp/semantic-conventions/blob/cartermp/ai/docs/ai/llm-spans.md
Critically, being OTel compliant allows any trace emitted by LLamaIndex to be automatically correlated with the rest of an application. That's critical for production use cases because there's often a rather complex pipeline for RAG or assembling a dynamic prompt, not to mention this is spread across different services or connected to other critical services.
Metrics is a decent enough signal type for really basic information, but for any actual application observability (when LLMs are in prod) you need good tracing data that connects to the rest of the application. Otherwise it's extremely difficult to determine if a poor user experience is directly related to an LLM call or if it's influenced by other complicating factors. You can't really pre-aggregate that information into metrics either, so traces are pretty much the only good option.
@cartermp I did look into it initially -- but it seemed less helpful, at least for the stage we are at right now.
The traces need some external running client to consume them from my understanding
If you have an idea for integrating this properly with llama-index though, would love to see it in a PR ā¤ļø
@logan-markewich yeah, that's correct. It's designed to export to another tool that stores the data and offers analysis. That's a critical workflow for production use cases - your app that generates telemetry can't also be in the business of storing it because the volume of data would get out of control.
It's maybe helpful to think about two different models:
The benefits to the latter are that you get incredible customizability and pluggability, although it's harder to do. But the downside is that if the way you need to model operations internally is difficult to map to OTel concepts, it's too hard. The first option is what a lot of tools offer instead. It's usually pretty easy to turn the "final trace product" into OTLP over gRPC or HTTP/proto/json.
Any further thoughts on adding Otel support. For tracing this is the standard format. Opens up to the entire ecosystem of providers that visualise.
And as @cartermp mentioned plus into the applications context monolith or distributed system.
Also worth pointing out the auto instrumentation feature that allows a code base to be instrumented with generic traces per dependency (e.g. Sqlite or FastAPI) without changing any code. Of course instrumenting business logic will need code changes.
Hi, @jon-chuang,
I'm helping the LlamaIndex team manage their backlog and am marking this issue as stale. From what I understand, the issue proposes a tracing ontology for a retrieval pipeline, discussing concepts like session, run, and task, and raising questions about implementation of callbacks, equivalence or conversion of tracing and logging, and plans for prototype collection in MLFlow and Prometheus. There is also discussion about refactoring the current callback system, adding a centralized database/query interface for stored trace information, and integrating OpenTelemetry traces for better application observability.
Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.
Thank you for your understanding and cooperation.
Feature Description
LlamaIndex can be thought of as an orchestrator and prompt management system across various subtasks.
Here is a proposed ontology of a retrieval pipeline:
session
: a session is a multi-round interaction consisting of multiple runsrun
: a run is a single round trip from client back to client (e.g. REST endpoint, jupyter notebook cell run). It can consist of multiple tasks.task
(currently calledEvent
): A task is a basic unit of work. It can consist of multiple subtasks, also considered tasks. A task can occur at any level of granularity. A task granularity is defined by itstask_type
(currentlyEventType
) and can instantiate its own callback handler. Examplesembed
llm_predict
retrieve
Example trace:
Flattened, this is:
Additional concepts:
label
: a label is an identifier for a task. Labels are stored in event payloads with a fallback to defaults insession/run/task
-levelCallbackHandler
s. Examples:llm_model
:openai[text-davinci-003]
,custom[ggml-int4-q4]
embedding_model
:openai[text-embedding-ada-002]
Example Consumers
Example Aggregations
Here we use SQL. One should use their imagination for how these may be expressed in other query languages.
References
Questions