Closed iNecas closed 4 years ago
Added some comments, but it's hard to speak about interfaces without actual first implementations first.
Agreed, I mainly wanted to be transparent about the direction. Thanks for the comments, will incorporate the parts that would be clear, the fogy ones I would leave until we have something e2e
@bwplotka I've filled in more gaps in the aggregator implementation. The interfaces start forming up. What's missing for end2end is:
input.SeriesIterator
dataframe.Schema
to the one the paruqet library understandsI would like to finish the first two points, but would be great for the command line options to be picked by somebody else, as I might not have much time left this week to finish those. Can be also good exercise to play with the code a bit and refactor if needed.
but would be great for the command line options to be picked by somebody else,
@iNecas I can try to do that, but I might need some hand holding initially.
One step closer: the StoreAPI connected and some initial connection to allow running almost end2end
go run ./cmd/obslytics export --input='{"endpoint":"127.0.0.1:10901","tls_config":{"insecure_skip_verify":true}}' --min-time="2020-07-03T13:15:04Z" --max-time="2020-07-03T18:00:00Z" --metric="net_conntrack_dialer_conn_attempted_total" --resolution=30m
| _id dialer_name instance job _sample_start _sample_end _min_time _max_time _count _sum _min _max |
| 1 alertmanager localhost:9090 prometheus 13:30:00 14:00:00 13:30:04 13:39:49 40 0 0 0 |
| 1 alertmanager localhost:9090 prometheus 16:00:00 16:30:00 16:29:49 16:29:49 1 0 0 0 |
| 1 alertmanager localhost:9090 prometheus 16:30:00 17:00:00 16:30:04 16:59:49 120 0 0 0 |
| 1 alertmanager localhost:9090 prometheus 17:00:00 17:30:00 17:00:04 17:29:49 120 0 0 0 |
| 1 default localhost:9090 prometheus 13:30:00 14:00:00 13:30:04 13:39:49 40 0 0 0 |
| 1 default localhost:9090 prometheus 16:00:00 16:30:00 16:29:49 16:29:49 1 0 0 0 |
| 1 default localhost:9090 prometheus 16:30:00 17:00:00 16:30:04 16:59:49 120 0 0 0 |
| 1 default localhost:9090 prometheus 17:00:00 17:30:00 17:00:04 17:29:49 120 0 0 0 |
| 1 prometheus localhost:9090 prometheus 13:30:00 14:00:00 13:30:04 13:39:49 40 40 1 1 |
| 1 prometheus localhost:9090 prometheus 16:00:00 16:30:00 16:29:49 16:29:49 1 1 1 1 |
| 1 prometheus localhost:9090 prometheus 16:30:00 17:00:00 16:30:04 16:59:49 120 120 1 1 |
| 1 prometheus localhost:9090 prometheus 17:00:00 17:30:00 17:00:04 17:29:49 120 120 1 1 |
| 1 alertmanager localhost:9090 prometheus 17:30:00 18:00:00 17:30:04 17:59:49 120 0 0 0 |
| 1 default localhost:9090 prometheus 17:30:00 18:00:00 17:30:04 17:59:49 120 0 0 0 |
| 1 prometheus localhost:9090 prometheus 17:30:00 18:00:00 17:30:04 17:59:49 120 120 1 1 |
I've tested just with my very limited thanos instance: would be great to see some real-world performance.
Next step: add the parquet piece.
@4n4nd once it gets end2end, it might be more clear how to move the thing forward. It's getting close.
Btw. I've got a bit futher with the cli part, to the initial usage might be there at the time the parquet writer is finished.
So reached to the point of being able to run this ting end2end:
# Pull data for a specific metric from a StoreAPI (sidecar or store) and save into parquet
$ go run ./cmd/obslytics export --input-cfg='{"endpoint":"127.0.0.1:10901","tls_config":{"insecure_skip_verify":true}}'\
--metric="net_conntrack_dialer_conn_attempted_total"\
--resolution=1h --min-time="$(date -uI)T00:00:00Z" --max-time="$(date -uI)T23:59:59Z"\
--out=net_conntrack_dialer_conn_attempted_total.parquet\
--debug
| dialer_name instance job prometheus _sample_start _sample_end _min_time _max_time _count _sum _min _max |
| default localhost:9090 prometheus prom-0 11:00:00 12:00:00 11:37:21 11:59:56 272 0 0 0 |
| prometheus localhost:9090 prometheus prom-0 11:00:00 12:00:00 11:37:21 11:59:56 272 37128 1 272 |
| thanos-query localhost:9090 prometheus prom-0 11:00:00 12:00:00 11:37:21 11:59:56 272 544 2 2 |
| thanos-receive localhost:9090 prometheus prom-0 11:00:00 12:00:00 11:37:21 11:59:56 272 110840 1 814 |
| thanos-sidecar localhost:9090 prometheus prom-0 11:00:00 12:00:00 11:37:21 11:59:56 272 272 1 1 |
| thanos-store localhost:9090 prometheus prom-0 11:00:00 12:00:00 11:37:21 11:59:56 272 271 0 1 |
| default localhost:9090 prometheus prom-0 12:00:00 13:00:00 12:00:01 12:34:11 411 0 0 0 |
| prometheus localhost:9090 prometheus prom-0 12:00:00 13:00:00 12:00:01 12:34:11 411 196458 273 683 |
| thanos-query localhost:9090 prometheus prom-0 12:00:00 13:00:00 12:00:01 12:34:11 411 822 2 2 |
| thanos-receive localhost:9090 prometheus prom-0 12:00:00 13:00:00 12:00:01 12:34:11 411 588552 817 2047 |
| thanos-sidecar localhost:9090 prometheus prom-0 12:00:00 13:00:00 12:00:01 12:34:11 411 411 1 1 |
| thanos-store localhost:9090 prometheus prom-0 12:00:00 13:00:00 12:00:01 12:34:11 411 411 1 1 |
level=info ts=2020-08-14T12:34:13.687660836Z caller=main.go:108 msg=exiting cmd=export
# Example of loading the parquet file from Python:
$ ipython -c 'import pandas as pd; pd.read_parquet("net_conntrack_dialer_conn_attempted_total.parquet")'
Out[1]:
dialer_name instance job prometheus _sample_start _sample_end _min_time _max_time _count _sum _min _max
0 default localhost:9090 prometheus prom-0 2020-08-14 11:00:00 2020-08-14 12:00:00 2020-08-14 11:37:21 2020-08-14 11:59:56 272 0.0 0.0 0.0
1 prometheus localhost:9090 prometheus prom-0 2020-08-14 11:00:00 2020-08-14 12:00:00 2020-08-14 11:37:21 2020-08-14 11:59:56 272 37128.0 1.0 272.0
2 thanos-query localhost:9090 prometheus prom-0 2020-08-14 11:00:00 2020-08-14 12:00:00 2020-08-14 11:37:21 2020-08-14 11:59:56 272 544.0 2.0 2.0
3 thanos-receive localhost:9090 prometheus prom-0 2020-08-14 11:00:00 2020-08-14 12:00:00 2020-08-14 11:37:21 2020-08-14 11:59:56 272 110840.0 1.0 814.0
4 thanos-sidecar localhost:9090 prometheus prom-0 2020-08-14 11:00:00 2020-08-14 12:00:00 2020-08-14 11:37:21 2020-08-14 11:59:56 272 272.0 1.0 1.0
5 thanos-store localhost:9090 prometheus prom-0 2020-08-14 11:00:00 2020-08-14 12:00:00 2020-08-14 11:37:21 2020-08-14 11:59:56 272 271.0 0.0 1.0
6 default localhost:9090 prometheus prom-0 2020-08-14 12:00:00 2020-08-14 13:00:00 2020-08-14 12:00:01 2020-08-14 12:34:11 411 0.0 0.0 0.0
7 prometheus localhost:9090 prometheus prom-0 2020-08-14 12:00:00 2020-08-14 13:00:00 2020-08-14 12:00:01 2020-08-14 12:34:11 411 196458.0 273.0 683.0
8 thanos-query localhost:9090 prometheus prom-0 2020-08-14 12:00:00 2020-08-14 13:00:00 2020-08-14 12:00:01 2020-08-14 12:34:11 411 822.0 2.0 2.0
9 thanos-receive localhost:9090 prometheus prom-0 2020-08-14 12:00:00 2020-08-14 13:00:00 2020-08-14 12:00:01 2020-08-14 12:34:11 411 588552.0 817.0 2047.0
10 thanos-sidecar localhost:9090 prometheus prom-0 2020-08-14 12:00:00 2020-08-14 13:00:00 2020-08-14 12:00:01 2020-08-14 12:34:11 411 411.0 1.0 1.0
11 thanos-store localhost:9090 prometheus prom-0 2020-08-14 12:00:00 2020-08-14 13:00:00 2020-08-14 12:00:01 2020-08-14 12:34:11 411 411.0 1.0 1.0
There is still big room for improvement, but should be enough to start playing with, use it it for some real use-case and enhance whatever needed. I would like to get this PR merged with limited set of additional changes (that I might not have capacity for in the following weeks) and continue in further increments on top of the main branch.
@bwplotka it should actually flushing the data at each sample: the finalize is just to dealing with the rest of the data that have not reached to the end of next sample.
Feel free @iNecas to create PRs without fork (on obslytics branch) - it might be easier for us take collaborate
On the path for the first working e2e:
input.SeriesIterator
dataframe.Schema
to the one the paruqet library understandsExample (current state):