Open fujita opened 6 years ago
Yes, but more interesting would be Zabbix.
Maybe there is a common way to handle both monitoring via prometheus and other monitoring systems in a generic way?
These metrics must be helpful for us.
Prometheus support would be greatly appreciated.
+1 for Prometheus
+1 for Prometheus!
+1 for Prometheus!
Is anyone interested in?
@fujita, I am interested π There are at least two ways to do that.
gobgp_exporter
, which queries gobgp
API.gobgp
Any preference?
@fujita , I looked that gobgp
code and I personally like point 1. Could you create a separate project, say gobgp_exporter
for the exporter under osrg
? I could put a PR for the exporter code pretty fast. The client API for gobgp
is pretty straight-forward, e.g. showRibInfo()
function.
Sorry about the delay, I've just started to investigate implementation options.
@greenpau why can't we pick the best of option 1 and 2? Having a separate project (exporter) forces users to manage another daemon. We can't run a go routine to execute the gRPC API and handle http requests from prometheus?
I've just create a branch to only send the received number in each peer: https://github.com/fujita/gobgp/commit/090cca2967b3251d4656b06e13864de4c1f8ac4b
@fujita, it depends on the circumstances.
Prom exporters are designed on per app basis. Compare node_exporter and backbox_exporter and you will get the gist.
- Sync vs Async polling.
I don't understand. I don't see how this is related with the exporter and exposing models.
- Containerizing daemons.
1 cpu for a container? Why? You can configure what you like.
- Relid configs. How do you reload prom config?
I had quick look at the official exporters, mysql, consul. Looks like they don't support the config reloading.
@fujita , see below.
1 cpu for a container? Why? You can configure what you like.
Not in containers. With docker for example, you can configure cpusets. However, that setting just allows you to switch between CPUs, it does not allow running on multiple CPUs concurrently.
I don't understand. I don't see how this is related with the exporter and exposing models.
Polling = scraping
Sync scraping means you make a call to the URL of an exporter, and subsequent data collection starts. Once the data collected it outputs it to your scraper.
Async scraping means that data collection runs independent of the scrape calls. In that mode, the scraper gets that data from data collection buffer.
Why that matters. Consider your exporter has 100-100s of metrics and data collection takes 1-3s. That's the time your GoBGP server routine competes with the collectors' routines for single CPU time. It obviously does not matter when you collect status of BGP peers, because it probably takes 0.00001s.
Looks like they don't support the config reloading.
Right. Consider you want to change the configuration of the exporter, would you restart GoBGP and drop neighbor relations with your peers?
Not in containers. With docker for example, you can configure cpusets. However, that setting just allows you to switch between CPUs, it does not allow running on multiple CPUs concurrently.
Hmm, really? Inside a single container, can you try to run two processes; each execute an endless loop? Then see the output of top command on a host OS.
Hmm, really?
Your GoBGP and Prometheus go routines are not separate processes. They are go routines. Hence, will be on the same CPU.
can you try to run two processes; each execute an endless loop?
@fujita , that would work :-) However, that would NOT be parent and child processes like in a container. These would be two separate parents. In a container, you have a single parent (akin init) and bunch of child processes.
By default, processes can run on any CPU. However, processes can be instructed to run on a predetermined selection of CPUs, by changing the affinity of the process. Child processes inherent the CPU affinities of their parents. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_MRG/1.3/html/Realtime_Reference_Guide/chap-Realtime_Reference_Guide-Affinity.html
@fujita , refreshed my memory on the subject. Here, there is a snippet to run the test you proposed.
As the sample runs below demonstrate, the amount of real and CPU time consumed when running the program will depend on intra-core caching effects and whether the processes are using the same CPU.
can you try to run two processes; each execute an endless loop?
@fujita, that would work :-) However, that would NOT be parent and child processes like in a container. These would be two separate parents. In a container, you have a single parent (akin init) and bunch of child processes.
Please run a process that create two child processes executing an endless loop and wait forever. See the output of top on the host.
Please run a process that create two child processes executing an endless loop and wait forever. See the output of top on the host.
@fujita , yes, two processes running in a container can be on separate CPUs. My mistake.
The test script:
import os
def child():
print('\nA new child ', os.getpid())
os._exit(0)
def parent():
while True:
newpid = os.fork()
if newpid == 0:
child()
else:
pids = (os.getpid(), newpid)
print("parent: %d, child: %d\n" % pids)
reply = input("q for quit / c for new fork")
if reply == 'c':
continue
else:
break
parent()
Running the test:
# python test.py
parent: 29, child: 30
q for quit / c for new fork('\nA new child ', 30)
The two processes are 1847
(parent) and 1848
(child):
$ ps -ef | grep python
root 1847 1817 0 20:05 pts/3 00:00:00 python test.py
root 1848 1847 0 20:05 pts/3 00:00:00 [python] <defunct>
CPU Sets:
# cat /proc/1847/status | grep Cpus
Cpus_allowed: f
Cpus_allowed_list: 0-3
# cat /proc/1848/status | grep Cpus
Cpus_allowed: f
Cpus_allowed_list: 0-3
The CPU process:
# cat /proc/1848/stat | cut -d' ' -f39
1
# cat /proc/1847/stat | cut -d' ' -f39
0
Good, now you understand goroutines DON'T compete for a single CPU inside a container. A container is just namespaces and processes from the perspective of Linux kernel so it can use as many CPUs as you have.
For me, some official exporters don't support reloading because it's not useful much. So I go with the current design. If someone needs reloading, it's pretty easy for him to implement a own gobgp exporter by using gobgp/exporter as a package; he just needs a main function.
now you understand goroutines DON'T compete for a single CPU inside a container.
@fujita , overconfidence π they DO compete when there is only 1 CPU to compete over π They also compete when you have more goroutines than CPUs. I get that an application, over time, might use all CPUs. How do you know that goroutines are being load balanced across all CPUs?
No difference between the exporter design and the current design on that.
Tally allows you to define and emit metrics in different ways. Its an option that allows you to configure multiple ways of collecting/emitting metrics. Just a suggestion.
@fujita , I started with external exporter here: https://github.com/greenpau/gobgp_exporter Would love to get your feedback on the code. There are probably some issues with concurrency. Still getting up to speed with Go. I spent my weekend on this π
@greenpau nice, but why you modify pb.go file?
why you modify pb.go file?
@fujita , I guess the primary reason is that I want to control the timeout on the client.
The Collect()
will be called on each scrape. If the daemon does not respond, the srcape holds the session and will not timeout right away. This results in hanging scrapes. It is not long, but should timeout in a second or two.
In my last commit, I added a logic where the GoBGP polling and scraping happen async. There is also some concurrency logic.
Additionally, ... somewhat related, but not ... since the changes to the API (hiding behind the /internal), the only way to get to some functionality (e.g. things related to ptype, any, etc.) is by adding extra functions in the way I did it with extended client. Not ideal. I would prefer to have that code in upstream, but I don't know whether you want to have it there.
I think I am close to getting some of the things mentioned here done.
why not using context.WithTimeout?
@fujita, because I donβt know how context works. I will take time to read up on it.
why not using context.WithTimeout?
@fujita , I was reading up on the Go's context
package. Say I implemented the context with timeout. The timeout gets triggered. How do I close the connection? (Ideally, in case of a timeout, I would prefer not keeping half-closed connections). How do I assess whether the connection is still alive?
@greenpau you do a select on ctx.Done() at some point of your connection. This is automatic with the net package I believe.
Is anyone interested in?