Open chrismcleod opened 4 years ago
This is one example of how these metrics could be rendered:
looks great! I'll cc tencent team
@chrismcleod great job, looks very promising! Few comments:
What will trigger generation of those metrics? Will our backed ask component for it periodically? Or will components on it's own generate them on request directly for the dashboard?
Isn't total
reduntant in below ?
{
name: 'requests', // constant
type: 'count', // constant
total: 8306, // total of all values for this series
values: [1497, 1022, 1010, 1002, 1186, 1331, 1258], // count of normal requests over the time range. Does not include errored requests
}
@medikoo the metrics are available as a run action on each component. each component can generate/cache its metrics however it wants.
A goal of this format is to remove as much logic from any consuming client as possible. The total here might, at this time, be a simple sum of the array; but I would rather the total be explicit rather than every consuming client need to re-calculate it using that logic (which might change). What do you think?
@medikoo the metrics are available as a run action on each component.
Ok so you mean that component is expected to respond with such metrics when sls metrics
(or sls run
) is run against it (?)
I have problems in understanding what triggers generation of metrics and how often it'll be generated
The total here might, at this time, be a simple sum of the array; but I would rather the total be explicit rather than every consuming client need to re-calculate it using that logic (which might change). What do you think?
As long as it's redundant, I would remove it. Redundant data in my feeling is (1) confusing (why it's provided, maybe it doesn't necessary reflects a sum in an array?) (2) error-prone (if it doesn't match a combined sum in an array, which should be treated as a source of truth?) and (3) there's no real cost for a client to resolve it on its own.
@medikoo the data behind metrics
is generated when there is an invocation to the related resources. e.g. whenever there is an invocation to cloud function, the log will be generated. sls metrics
will trigger the metrics function on component, and the component will query whatever the cloud infrastructure can provide to query those invocation logs, e.g. AWS Cloudwatch, Tencent metrics queries...
@medikoo To put it in simpler terms (if I understood it correctly), this is just a standard outputs format/structure for any metrics
method on any component.
Each component would have different logic for actually gathering the data, but at the end of the day, they need to return the data in that format so that the frontend could consume it regardless of which component it came from.
Did I understand that correctly @chrismcleod ?
@hkbarton @eahefnawy thanks for clarification.
So components will write logs to CloudWatch on the course of typical commands (deploy etc)
Then there'll be dedicated sls metrics
command, on which component should query the CloudWatch logs it generated and prepare metrics data for given query. Right?
Do we have some strategy planned for scaling that? e.g. I can imagine that some active component can in short time produce a significant amount of logs. Having that it may be near impossible for component to retrieve all needed logs and generate metrics for typical sls metrics
call.
Usually generation of such metrics is backed with quite sophisticated tools which are equipped with means to handle large amount of input and deducting needed answers in short time.
@eahefnawy that is correct. Consumers could be "any" client. Including perhaps some basic CLI charts in a galaxy far far away ;)
This format will also eventually be a sub-property of a full custom dashboard description.
@medikoo I think we already solved that problem in our current Framework Pro dashboard, no? 🤔
@medikoo I think we already solved that problem in our current Framework Pro dashboard
If I understand proposal correctly, component will provide to dashboard a result chart, and it's dashboard front-end that will draw them as received.
I was wondering how it'll work in case of intensive apps. e.g. I remember working for client with thousands of users which had millions of CloudWatch logs generated everyday. How component can produce for such case a reliable charts data which e.g. overview a month?
@hkbarton settled me down a bit that on Tencent side, provider provides an elastic search mechanism to logs, so already component would query a reduced (result set) and won't inspect logs 1 by 1 in its won capacity.
I wonder what's the plan for AWS, afaik CloudWatch doesn't provide such feature out of a box (but I also have limited experience with CloudWatch)
Component Metrics Format
@ac360 @eahefnawy @hkbarton @medikoo
Motivation
Each component needs the ability to report metrics in terms of the use-case for that component. For example a website component needs to see a requests count and is not too concerned with the memory consumed. We want to build tools that can consume these metrics from any component moving forward without having to re-factor the tools or have unique metrics handling per-component. To facilitate this, we need a common metrics format.
Proposal
Based on a mock provided by ac360, this is the format needed for each metric. There is essentially two data formats. First, a "stacked" format where there is one set of x values and multiple series of y values. Second, a "basic" format that is a simple set of x, y data points. This structure borrows heavily from highcharts as they have hardened their definitions for quite some time.
This is an example response for a 7 day range for the express component. We want to be sure that we have all timestamps for the chart range, even if the values are 0. If ALL values are 0, return the empty chart.
Empty chart
Common time buckets
15 minutes
60 minutes
24 hrs
7 days