Closed jzonthemtn closed 2 months ago
We have a call for names for this tooling, join the thread at https://opensearch.slack.com/archives/C051JEH8MNU/p1706895236557209
Thanks for proposing this. I would like to treat events as metadata. In our use case, we have metadata information in rest API request header. Metadata information like client_id, data object type, etc. We really need raw opensearch request body and response body logged in AWS cloudWatch or s3 for query pattern analysis. We did plugin explore and we found hard blocker in opensearch security plugin. The reason is opensearch security plugin already did getRestHandlerWrapper override. We cannot override this in our plugin to do our own logging and this restHandlerWrapper is only way for us to do this as far as I know. I hope we can prioritize this RFC for community. Or if anyone can guide me to generate log in alternative way will be appreciated.
Hi @jzonthemtn, Thank you for initiating this. Indeed, this feature holds significant potential. I firmly believe that implementing the mentioned feature is achievable by utilizing the Request Tracing and Metrics framework, which encompasses both traces and metrics. This feature is already launched as an experimental feature in OpenSearch 2.11 release.
We currently leverage OpenTelemetry, an open-source and widely embraced telemetry solution, which provides a solid foundation for this endeavor. Moreover, we can utilize OpenSearch Dashboard and other observability tools like Prometheus and Grafana to construct a comprehensive dashboard for monitoring and analysis purposes.
cc: @reta
Thanks for proposing this. I would like to treat events as metadata. In our use case, we have metadata information in rest API request header. Metadata information like client_id, data object type, etc. We really need raw opensearch request body and response body logged in AWS cloudWatch or s3 for query pattern analysis. We did plugin explore and we found hard blocker in opensearch security plugin. The reason is opensearch security plugin already did getRestHandlerWrapper override. We cannot override this in our plugin to do our own logging and this restHandlerWrapper is only way for us to do this as far as I know. I hope we can prioritize this RFC for community. Or if anyone can guide me to generate log in alternative way will be appreciated.
Thanks @shikeli, and thanks for the pointer on the getRestHandlerWrapper
override. You are looking to capture the raw queries and their results? Would being able to export the captured metadata to a file format like Parquet work for your purposes?
Hi @jzonthemtn, Thank you for initiating this. Indeed, this feature holds significant potential. I firmly believe that implementing the mentioned feature is achievable by utilizing the Request Tracing and Metrics framework, which encompasses both traces and metrics. This feature is already launched as an experimental feature in OpenSearch 2.11 release.
We currently leverage OpenTelemetry, an open-source and widely embraced telemetry solution, which provides a solid foundation for this endeavor. Moreover, we can utilize OpenSearch Dashboard and other observability tools like Prometheus and Grafana to construct a comprehensive dashboard for monitoring and analysis purposes.
cc: @reta
Hi @Gaganjuneja, thanks for the links to the Tracing and Metrics RFCs. I am not super familiar with OpenTelemetry, so please excuse on my ignorance on the subject and I appreciate your recommendation of it. The data we want to capture will include events generated client-side (clicks, scroll depth, etc.) tied to backend events (search queries, results for the queries, etc.). When I hear "telemetry" I think of metrics/traces/etc. to support instrumentation of a distributed application to have visibility into the application itself. How do you see our types of events fitting into OpenTelemetry's paradigm of metrics/traces/etc.? Also, the end-users of our event reporting will likely be data scientists, search relevance engineers, and business analysts. Do you think Prometheus and Grafana would be suitable backends to allow those types of users to get the insights they need? Last question -- we want the system to be extensible. If you think OpenTelemetry is a good choice, how would you feel about it being an option? For instance, event data could, by default, be stored in an OpenSearch index and viewed by an OpenSearch Dashboards plugin, but the user could have the option to switch to using an OpenTelemetry/Grafana/Prometheus backend. Your input is much appreciated.
Thanks for proposing this. I would like to treat events as metadata. In our use case, we have metadata information in rest API request header. Metadata information like client_id, data object type, etc. We really need raw opensearch request body and response body logged in AWS cloudWatch or s3 for query pattern analysis. We did plugin explore and we found hard blocker in opensearch security plugin. The reason is opensearch security plugin already did getRestHandlerWrapper override. We cannot override this in our plugin to do our own logging and this restHandlerWrapper is only way for us to do this as far as I know. I hope we can prioritize this RFC for community. Or if anyone can guide me to generate log in alternative way will be appreciated.
Thanks @shikeli, and thanks for the pointer on the
getRestHandlerWrapper
override. You are looking to capture the raw queries and their results? Would being able to export the captured metadata to a file format like Parquet work for your purposes?
Thanks for quick response. What do you mean by metadata, is it raw request and raw response? If you can export raw request and response to a file, that should be able to solve our problem.
Thanks for proposing this. I would like to treat events as metadata. In our use case, we have metadata information in rest API request header. Metadata information like client_id, data object type, etc. We really need raw opensearch request body and response body logged in AWS cloudWatch or s3 for query pattern analysis. We did plugin explore and we found hard blocker in opensearch security plugin. The reason is opensearch security plugin already did getRestHandlerWrapper override. We cannot override this in our plugin to do our own logging and this restHandlerWrapper is only way for us to do this as far as I know. I hope we can prioritize this RFC for community. Or if anyone can guide me to generate log in alternative way will be appreciated.
Thanks @shikeli, and thanks for the pointer on the
getRestHandlerWrapper
override. You are looking to capture the raw queries and their results? Would being able to export the captured metadata to a file format like Parquet work for your purposes?Thanks for quick response. What do you mean by metadata, is it raw request and raw response? If you can export raw request and response to a file, that should be able to solve our problem.
@shikeli Yes, the export would be the search requests/responses along with the events generated by the client-side. Our desire is to capture the raw requests/responses, but I'm not yet entirely sure what technical impediments we might encounter (like your getRestHandlerWrapper
problem) but raw is our goal.
hen I hear "telemetry" I think of metrics/traces/etc. to support instrumentation of a distributed application to have visibility into the application itself. How do you see our types of events fitting into OpenTelemetry's paradigm of metrics/traces/etc.?
Thanks @Gaganjuneja , I would agree with @epugh here, we should be thinking about telemetry as operational instrumentation, the user behaviour sits few level above that. To your point though, there could be cases to derive the user behaviour out of the user-focused metrics if plugin / extension authors would see the need to do so this way, it could be a good complementary channel
Thanks @jzonthemtn for the proposal. This is very similar and has overlap with the query insights proposal and ongoing work.
Reference RFCs: https://github.com/opensearch-project/OpenSearch/issues/11008 https://github.com/opensearch-project/OpenSearch/issues/11186
Reference PRs and issues: Query Insights Plugin: https://github.com/opensearch-project/OpenSearch/pull/11903 TopN Queries: https://github.com/opensearch-project/OpenSearch/pull/11904 Search Query Categorization Issue: https://github.com/opensearch-project/OpenSearch/issues/11596
Please see the Query Insights section on the sprint board: https://github.com/orgs/opensearch-project/projects/153/views/8
We also aim to improve the users search experience and search performance. We have similar plans as mentioned above to add instrumentation on the search path, create an analytics dashboard to visualize the metrics, connect user to the queries executed, etc.
Could we try to leverage the insights plugin for the above?
Are we mostly focusing on client side logging in this RFC? we can also investigate how to combine client side and server side insights (query insights initialtives as deshsidd mentioned above) together and correlated the information to get more insights, which would be super cool
This proposal is about tracking user behavior whether it results it a call to the OpenSearch back end or not. It is about understanding search quality (relevance), not about understanding the performance characteristics of the search server.
Even when it does result to a call to the OpenSearch back end, it may or may not be the same query. For example, when the user searches for [red dress], the application may rewrite that (query understanding) as [red dress] + 0.9taxonomy:dress + 0.9color:red before sending it to OpenSearch for processing (or it might do that in the Search Pipeline).
But most user actions that help us evaluate search quality do not include a call to the OpenSearch back end. For example, clicking on result 3 does not call the back end. Putting result 5 in the shopping basket does not call the back end. etc.
This client-side behavior often needs to be correlated (joined) with the server-side behavior in many cases, for example to capture any processing done by the application, or for performance analysis. But the two are different. For example, the server-side Query Insights is interested in the Top-n slowest queries because they may reflect a performance issue; whereas the client-side User Behavior Logging is interested in the Top-n most common queries, because they help us understand what users are doing. In that particular case, it is possible to collect the data on either the client or the server side (modulo query rewriting), but other cases -- such as the Top-n queries where the user selects none of the results -- require client-side information.
What the right mechanism for capturing user behavior is another question. Should User Behavior Logging use Open Telemetry? That is certainly one possibility.
Thanks @jzonthemtn for the proposal. This is very similar and has overlap with the query insights proposal and ongoing work.
Reference RFCs: #11008 #11186
Reference PRs and issues: Query Insights Plugin: #11903 TopN Queries: #11904 Search Query Categorization Issue: #11596
Please see the Query Insights section on the sprint board: https://github.com/orgs/opensearch-project/projects/153/views/8
We also aim to improve the users search experience and search performance. We have similar plans as mentioned above to add instrumentation on the search path, create an analytics dashboard to visualize the metrics, connect user to the queries executed, etc.
Could we try to leverage the insights plugin for the above?
Hi @deshsidd, thanks for those links. We're definitely in favor of using existing things where possible so we will take a look and see what overlap exists there.
@jzonthemtn Thanks for creating this. OpenSearch Dashboards does have a useageCollector built in that we had disabled during the fork that does exactly this. It has a lot of the tooling and features you are discussing here and should be something that might solve this problem immediately. OpenSearch could also build in something similar that it and its plugins can use to add to this.
@ashwin-pc Tell us more about usageCollector! Where can we find documentation on it? What is the schema of data it collects? Does it have client-side (Javascript etc.) components to collect search results and actions on them?
@smacrakis So i've just started looking into this since OSD is looking to solve the same problem. But essentially we have 5 core plugins that do varous things related to telemetry and useage collection in OSD. You can find the existing documentation for each of these here:
They each have a readme outlining their purpose but i'm yet to deep dive into what they do and how they work. I do know that we didnt remove any of this tiooling post the fork and only commented out the section that reports this information to a telemetry endpoint.
Interesting proposal! Just went it through and have several questions and comments:
We lean on OpenSearch’s ability to log and analyze data
When you say "analyze" the client side data, does that mean we want to build any user behavior analysis capability within OpenSearch? in other words, will we be building any analysis algorithm, or they are the end users' (as you mentioned "data scientists, search relevance engineers, and business analysts") responsibility?
store the behavior metrics in OpenSearch indices
Have we evaluate other alternatives? I'm a little bit worried about the potential storage impact. I think this also depends on the answer to the previous question - do we need to somehow utilize this user behavior data within OpenSearch? If not we can provide options to export to different sinks (and OpenSearch Index would be one of them).
But from the perspective of "providing overall better performance insights", I would really love to see these data be available within OpenSearch. As I mentioned before, we can invest on generating insights and recommendations from combining user behavior data and server side query insights data (if OpenSearch is also used as the search backend). One use case would be (my wild thought!), knowing what "type" of the user is, we can optimize the search performance by rewriting the search queries based on different user types.
link client-side actions with backend search actions
This might be a implementation-wise question, how to link the client side and server side actions? I'm not sure if it would be an easy task, as @smacrakis mentioned in his comment:
Even when it does result to a call to the OpenSearch back end, it may or may not be the same query. For example, when the user searches for [red dress], the application may rewrite that (query understanding) as [red dress] + 0.9taxonomy:dress + 0.9color:red before sending it to OpenSearch for processing (or it might do that in the Search Pipeline).
@ansjcy Thanks for your interest and for your questions.
Although the initial implementation uses OpenSearch as its back end,
We plan to “program to the interface” to permit future extensibility. For instance, we plan to store event data in OpenSearch, but do not want to restrict someone from creating the ability to use a relational database as the backend instead.
In particular, there is no requirement that the same index be used to store the behavioral logs as is used to provide search results, so that the analytics workload won't affect search latency.
As for analysis, our plan is to provide analytics tools in OpenSearch Dashboards. We also expect that the community will supply its own tools running on Dashboards or perhaps elsewhere.
Closing the feedback loop to search results is certainly an important goal. We expect that we'll be able to provide near-real-time access to the results so that search results can be adjusted in-session. As always, the devil is in the details....
Thanks for your response!
our plan is to provide analytics tools in OpenSearch Dashboards.
I would still advocate, the query insights plugin should be a good place to hold those analysis tools! We have built top n queries feature in this plugin and will start on the dashboard component (https://github.com/opensearch-project/OpenSearch-Dashboards/issues/5571) to expose these information. If we have the user behavior data stored in an index, it would be straightforward to implement processors for analytics within the query insights plugin and build analytics ui in a similar way.
In this way we can easily combine the client side and server side insights to achieve more, for both performance and analytics purposes.
Is the goal of this to log all user behaviors or just that specific to one that use opensearch calls such as search? e.g. See if a user on my website has visited a particular page or used a particular feature. If yes, then can OpenSearch Dashboards itself use this framework to track its users for similar behaviours?
Is the goal of this to log all user behaviors or just that specific to one that use opensearch calls such as search? e.g. See if a user on my website has visited a particular page or used a particular feature. If yes, then can OpenSearch Dashboards itself use this framework to track its users for similar behaviours?
You are touching on one of the key points of discussion which is how opinionated (structured?) should we be about what is recorded. The more structured the format of the events/actions/data we capture, then the easier it is to provide valuable out of the box insights via the dashboards, but the more limiting the use cases. If we open up the format to being able to accept a VERY broad set of attributes, then that lets the builder do more amazing things, but at the cost of less structure in our data, harder onboarding process, and fewer "out of the box insights" that can be provided.
@ashwin-pc Tell us more about usageCollector! Where can we find documentation on it? What is the schema of data it collects? Does it have client-side (Javascript etc.) components to collect search results and actions on them?
+1. I would like to know exactly what data we plan to capture to in the first release to validate that we have what is necessary for tuning ML models.
I think this feature should not be part of core but 100% plugin or/and extension (this is opt-in functionality and not a core one) The plugins / extensions already have the mechanism to enrich the search request response with ext
section), and with extensions there is an option to of off-process / off-node.
@reta After discussion, the implementation team has come to the same conclusion, and we are removing most functionality from core. The only part remaining is logging queries and responses, which of course will be under user control. As for the ext
section, I suppose we could put the client query ID (whatever we call it) in an ext
section in the query (although currently only the response has an ext
section).
Based on the update that @epugh presented today on community call I would like to point out that all the UBI data should be possible to store outside the "production" cluster. Actually, storing this data into the same cluster should be possible only for easy "try-out" scenario but should not be considered for any real use case IMO.
Not only managing indices for UBI will take resources (and it might be hard to control) but it may be required by legislation to store, backup and treat this data in very specific way. (I get it that the data is anonymous but still it can contain very sensitive information).
in an
ext
section in the query (although currently only the response has anext
section).
@smacrakis Not only responses, the search requests have ext
section as well.
@reta Interesting -- the 2.13 doc for _search only says "plugin authors can add an ext object to the search response", but the doc for the Rerank processor includes an ext on search. Looks like a documentation bug.
Regarding the plugin/non-plugin conversation, UBI development will proceed as an external plugin in its own repository and not as a module/plugin inside the opensearch-project/Opensearch repository. Thanks to everyone involved in that conversation.
With that direction now known, I would like to see about closing this RFC. Everyone involved with UBI is still very much open to the community's thoughts (and contributions :), but the upcoming UBI plugin repository might be a better location for those conversations. I'm not familiar with the process to close an RFC so please let me know if there are any objections to doing so.
Closing this RFC because the initial implementation is now available at https://github.com/opensearch-project/user-behavior-insights and new issues can be created there.
Interested in this topic? Learn more at https://github.com/opensearch-project/user-behavior-insights and https://opensearch.org/docs/latest/search-plugins/ubi/index/.
User Behavior Insights (UBI)
This RFC has been revised to describe an approach more integrated with OpenSearch. We now call this functionality "User Behavior Insights" (UBI).
Summary
This RFC is an evolution of 4619 to capture user behaviors and track queries through all steps of querying and website usage.
This RFC proposes functionality in OpenSearch to store application user behavior and corresponding queries in OpenSearch indexes. It also includes an analytics dashboard integrated with OpenSearch Dashboards for analyzing and visualizing the collected information.
UBI will link client-side actions with backend search actions, such as linking queries submitted by users with customer clients, scroll depth, and search result detail pages viewed.
What users have asked for this feature?
This functionality has been discussed on the OpenSearch Search Relevance Meetup and through individual conversations with users of OpenSearch and with the larger community.
What problems are you trying to solve?
The key problem is that OpenSearch users are missing a holistic view of client-side, browser, and app events to enable a deeper understanding of search user behavior for the purposes of improving search relevance and user experience.
With this tooling, users of OpenSearch will be able to collect client-side events and link them with queries from their data stores. This will allow users to create a comprehensive view of users’ search journeys to improve the user experience.
What is the developer experience going to be?
Pre-Existing Work
The work described here has been successfully implemented as an OpenSearch plugin. Due to several factors such as maintaining a plugin, promoting adoption, and ease of use within OpenSearch, it has been determined that a plugin is not the optimal approach. This RFC has been updated to reflect this new direction.
For a description of the plugin's implementation, please see previous revisions of this issue or the plugin's repository.
Proposed Work
Core Contributions
Persistence of Queries and Client-Side Events
Queries, including their results, and client-side events will be indexed to two OpenSearch indices. One index will contain the queries, and the other will contain the client-side events.
These indices are
.ubi_queries
and.ubi_events
. They will be automatically created and store queries and events for all OpenSearch indexes. (In the plugin implementation there was the concept of a "store" and there was a one-to-one correlation with a store and an OpenSearch index. This is no longer necessary as it can be accomplished with only these two indexes.)Schema of Queries Index
The queries index will contain all queries that were received by OpenSearch which include a top-level
ubi
block. Thetimestamp
,query_id
, and other information about the query will be indexed.Schema of Client-Side Events Index
The events index will contain the client-side events indexed into OpenSearch by the client. Some fields are standardized; most are optional. Others can be customized as needed.
Query Requests and Query Responses
Assumption: the user is on a search-enabled website powered by OpenSearch containing the functionality described above.
When the user performs a search on the website, the query is sent to OpenSearch with a
ubi
block in the request. Thisubi
block provides information about the search and the presence of the block tells OpenSearch to persist this query and the query's results. An exampleubi
block is:The fields and their names in the
ubi
block may change, but the important part is thequery_id
value which uniquely identifies this search. This value is used to link client-side events with searches, and vice-versa. If thequery_id
value is not provided, OpenSearch will generate a randomquery_id
and return its value in the search response.The presence of the
ubi
block in the search request causes OpenSearch to index the query and the query results.Every search result has a unique ID. That result ID can be carried through the whole reporting system so that all actions are correlated with the result they came from. In many applications, there is additionally a unique item ID which identifies the underlying object which is referred to by the result ID. There is an N-to-1 relationship between item_ID and result_ID. That is, the same object may have been returned as result 2 of search 1234, and as result 7 of search 3456.
Similarly, the search response will be modified to also include a
ubi
block:In the example above, the search response has been modified to include a
ubi
block which contains thequery_id
. If aquery_id
was provided in the query request, this will be the same value. If aquery_id
was not provided in the query request, thequery_id
in the response will be a random UUID. It is recommended that clients manage their own query IDs but OpenSearch will generate a random query ID when necessary to avoid any breaking behavior or undesired effects.Client-side JavaScript Reference Implementation
A reference implementation of the JavaScript client-side code to capture common events and index those events in OpenSearch will be provided. The code is not intended to be comprehensive or complete, but rather a starting point for users to modify to meet their unique needs.
Code Drops
The Code Drops described below were chosen to be atomic pieces of work suitable for pull requests and review/commit by OpenSearch maintainers. They were similarly selected to avoid any breaking changes. All Code Drops include the appropriate documentation and tests.
Open Source and Best Practices
Research of currently available open source libraries under acceptable licenses will be conducted to discover which can be either utilized directly or customized to meet our needs.
We will “program to the interface” to permit future extensibility. For instance, while event data will be stored in OpenSearch, there will be no restrictions on creating the ability to use a relational database as the backend instead.
The development plan will evolve over time. Whenever possible, so as to not reinvent the wheel, priority will be given to the use of existing open source code as well as the application of existing standards.
Are there any security considerations?
The community’s input around these items will be vital during development.
Are there any breaking changes to the API?
No breaking changes to the API are expected.
What is the user experience going to be?
The user will be able to analyze the collected events via a dashboard that is integrated with OpenSearch Dashboards. This functionality will likely be implemented as a its own OpenSearch Dashboards plugin or integrated into the OpenSearch dashboards-search-relevance plugin.
The data will be queryable using SQL and/or DSL, and be exportable to an external data store for additional analysis or training machine learning models.
Are there breaking changes to the User Experience?
No breaking changes to the user experience are expected.
How is this different from other click-tracking applications?