Open zyy17 opened 4 years ago
Thanks for submitting this proposal @zyy17 !
Can you share more about the use-case for these types of metrics? Is this primarily for the development of the WASM internals for OPA, or to modify the policy to reduce evaluation time?
Stepping back a little bit from the WASM specific part of this, there are two main sets of metrics that users/devs have access to for OPA using the golang API's, CLI, etc:
--profile
option for opa eval
gives. This includes counters and timing information related to the evaluation of parts of the policy at a rule level.pprof
and the golang tooling.The proposed counters you have is somewhere in the middle of the two.
I ask about the use-case because there are a couple of sort of "end users" I can think of that might consume these. There are the policy authors writing Rego, compiling into WASM, and wanting to do performance analysis of their policy to make improvements, track regressions, etc. Then there are the OPA developers working on the wasm compiler/planner code that are looking to improve the generated WASM program.
What we probably want to do is be mindful of which group would be using these types of metrics. For the policy authors I would lean towards trying to aim for similar levels for the WASM tooling as what is offered in the OPA tooling. That way someone can write policies using the normal golang OPA runtime, be familiar with the tooling there, and it would translate better to the WASM environment.
For the OPA developers, like if we wanted to have some ability to generate like debug WASM binaries for dev & test these kind of counters I think are potentially more interesting/useful.
I guess maybe another follow up question relating to the use-case. If you had metrics similar to what opa eval --profile
gives, would that be as useful? Or for the WASM use-case you have do you really need to know the exact counts and timing for something like the string comparisons?
@patrick-east Thank you for your response!
When I add metrics in OPA-WASM library, I have found two questions that related to the performance:
Too more __builtin_wasm_grow_memory()
calls because of less memory allocation by wasm runtime;
Too more opa_value_iter()
calls because of planner Bug #2260 ;
Because I'm not so familiar with the core logic of planner and internal/wasm
, I need to have some observability to know what's happening when executing wasm binary. The metrics data shows the unusual phenomenon that I can track further in the specific code.
When we run rego code in Golang env, the profiling will be very simple and helpful, but I think there is still some difference with wasm env.
I think my proposal is just another auxiliary way to observe wasm for OPA developers and can work with --profile
. Maybe most of time, the users don't need to use it unless they want to debug some performance issues.
That makes perfect sense, and I think makes a good case for this type of metric being useful for developers working on OPA. Like you said, without being familiar with the planner logic having that extra observability is important.
I think my proposal is just another auxiliary way to observe wasm for OPA developers and can work with --profile. Maybe most of time, the users don't need to use it unless they want to debug some performance issues.
+1, I think that we would probably want these types of metrics to be exposed only for OPA developers, or very advanced users. Similar to the pprof
category of metrics for OPA.
As far as the implementation is concerned If using macro's it requires re-building OPA from source. This is maybe OK for the really low level stuff (similar to using pprof on opa eval
), but we would need a different approach for the equivalent of --profile
, or even like opa run -s --pprof
option where the end-user can get metrics without having to build OPA from source.
Ideally for the equivalent of --profile
we would have a flag for opa build
to signal that the compiler should generate wasm code that includes profiling/metrics collection.
consider the performance, the metrics code need to be wrapped by condition compile macro
Out of curiosity, how much does it affect performance? Would it be possible to implement the important functions (memory alloc/free, etc) to call the metric API hooks all the time, and move the implementation of the hooks to be generated by the compiler? That way when someone supplies the flag to do low level instrumentation to opa build
we could emit code to track them, but default to a noop implementation. There would be extra overhead to call the function but it might be pretty minimal compared to the whole policy evaluation.
I was also thinking about whether we would maybe want to just have these low level metric APIs (counter increment, timer start/stop, etc) call into external functions that could be implemented by the runtime environment. Eg, for the javascript wasm runtime they could be hooked into the console.time()
and console.timeEnd()
so the user would be able to integrate into the native debugging/profiling tools in their environment.
Last up, WRT to dump_metrics()
, I think we would probably want to have a more structured API. I would imagine this sending back a JSON object or something, similar to the evaluation results. Then the runtime environment can handle them in the "best" way for the integration.
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.
I'm working with OPA wasm for a while and develop the easy metrics(only counter) mechanism in OPA-WASM library to display the execution times of the low-level functions.
For example, if we are interested in how many times of
opa_strncmp()
runs, we can add counter in the entry:We provide the metrics counter to collect all the metrics and
dump_metrics()
function to print all the metrics(split by\n
), looks like(consider the performance, the metrics code need to be wrapped by condition compile macro):When user call
dump_metrics()
,it can print all the metrics:Because of the lack of wasm profile tools, we have to develop the metrics to find out the exact times of key functions runs(such as
__builtin_wasm_grow_memory()
) and it works very well.If it is the accepted way to add metrics, I can open the PR for code review :)