[Feature Proposal] Prometheus Endpoint, etc.

thannaske commented 6 years ago

I did not find any documentation concerning the topic monitoring the health of the mailCow instance. So a nice feature would be an API endpoint you can consume to gather all relevant statistics to monitor them in an automated manner. A good solution would be a Prometheus endpoint as this is developing as a widely used standard in the matter of time series monitoring.

So you could hit the public-facing mailCow installation on a key-protected or IP-limited API endpoint e.g. mailcow.example/metrics and get all information about:

Current Docker status (Container up and running or not)
Rspamd actions (Spam/Ham, Rejected, Greylisted, Header added, etc.)
Ingoing and outgoing message count for each server/domain/mailbox
Failed or successful logins at the mailCow interface
API hit rate

If I get this right these are all information that are available around the mailCow environment. So it would be possible to collect all of them and provide them at a collected and standardized endpoint.

Currently I don't have enough time to implement this on my own. Maybe there are some mailCow-masters around there that can easily find and collect the proposed metrics and expose them. But if not I'll look into this issue around October when I have time for it.

Adorfer commented 6 years ago

I would really appreciate that. (then i would do something to wrap those data round for a check_mk plugin)

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ThomDietrich commented 5 years ago

It's a shame this issue never got a response by one of the repository maintainers. For my freshly set up mailcow system I would very much like this feature. Is anyone out there interested in such a feature or did already implement something similar?

thannaske commented 5 years ago

It's a shame this issue never got a response by one of the repository maintainers.

Wow. There are nearly 100 open issues and this one is just a feature proposal. @andryyy is maintaining this awesome repository for free. It's open source. If there are other priorities you have the possibility to file a pull request. For this kind of "Anspruchsdenken" an open-source repository is the wrong place.

andryyy commented 5 years ago

I like the idea. And I am sorry it was auto closed. But I personally have no time for this at the moment. :(

ThomDietrich commented 5 years ago

@thannaske this was not at all meant as an insult. I'm surprised you'd think that. As a very active open source developer myself, I am well aware of how a community and a ticket system works. Discussing ideas, proposals and solutions freely on a friendly basis is most important.

Your proposal was well composed and it is "a shame" the automated stale bot took a hit at it. My question was btw not specifically directed at @andryyy.

Is anyone out there interested in such a feature or did already implement something similar?

I was actually hoping that you or any other subscriber might have some previous experience in the field?

@andryyy I am personally not a big fan of issue bots - but that is just my personal opinion and not important here. Don't worry, I feel with you :) Did you discuss ideas/solutions in the past or do you have a preference how this feature could look like? The /metrics go endpoint format approach is established for Prometheus and some other applications, other solutions could (additionally) be collectd or graphite based. This could be based on tools like Telegraf or similar. There are many options, maybe others want to add suggestions from their experience.

Happy to continue the discussion!

thannaske commented 5 years ago

this was not at all meant as an insult. I'm surprised you'd think that. As a very active open source developer myself, I am well aware of how a community and a ticket system works. Discussing ideas, proposals and solutions freely on a friendly basis is most important.

nvm then, sorry that I got your answer wrong.

I was actually hoping that you or any other subscriber might have some previous experience in the field?

I actually have and I'm actually pretty familiar with PHP which is used on front-facing API-side in this project. Unfortunately my time planning for 2018 were a bit too optimistic, so I had no time at the end of October to look further into this issue.

This could be based on tools like Telegraf or similar

Telegraf is also able to scrape metrics from the /metrics endpoint. We could also provide an argument that modifies the format of the output, e.g. /metrics?format=(influx|json|prometheus|...)

andryyy commented 5 years ago

I have no idea how it could look like. Let me know if you guys have ideas.

ThomDietrich commented 5 years ago

Hey guys! Not the polished solution we were dreaming of... but I've built something to meet my needs. The solution below uses Telegraf on the host to executes pflogsumm every x minutes and send mail stats to a database. @thannaske different direction than discussed. It's difficult to provide a general datapoint when the data is pre-aggregated...

My intermediate goal is to send metrics into InfluxDB to visualize them in Grafana, however the solution is easily adapted to other systems. The solution also solved the task from the outside, let's discuss how tho move this inside a container.

Here we go:

My telegraf configuration /etc/telegraf/telegraf.d/exec.conf:

[[inputs.exec]]
  interval = "30m"
  commands = ["/opt/mailcow-dockerized/pflogsumm2influx.sh 30m"]
  data_format = "json"
  name_override = "pflogsumm"
  timeout = "5s"

with an additional output plugin configuration you can send the data to InfluxDB, Prometheus, you name it.

A simple bash script pflogsumm2influx.sh:

#!/usr/bin/env bash

set -o errexit

if [ -z "$1" ]; then echo "Please provide aggregation timespan as argument (e.g. 30m or 1h)" > /dev/stderr; exit 1; fi
# TODO: check $1 valid

out=$(docker logs --since $1 $(docker ps -qf name=postfix-mailcow) | /usr/sbin/pflogsumm)

if [[ "$out" != *"Grand Totals"* ]]; then
  echo "Unexpected output from pflogsumm. Exiting..." > /dev/stderr
  exit 1
fi

[[ "$out" =~ ([[:digit:]]+)[[:space:]]*received ]] && received=${BASH_REMATCH[1]} || exit 1
[[ "$out" =~ ([[:digit:]]+)[[:space:]]*delivered ]] && delivered=${BASH_REMATCH[1]} || exit 1
[[ "$out" =~ ([[:digit:]]+)[[:space:]]*forwarded ]] && forwarded=${BASH_REMATCH[1]} || exit 1
[[ "$out" =~ ([[:digit:]]+)[[:space:]]*deferred ]] && deferred=${BASH_REMATCH[1]} || exit 1
[[ "$out" =~ ([[:digit:]]+)[[:space:]]*bounced ]] && bounced=${BASH_REMATCH[1]} || exit 1
[[ "$out" =~ ([[:digit:]]+)[[:space:]]*rejected ]] && rejected=${BASH_REMATCH[1]} || exit 1
[[ "$out" =~ ([[:digit:]]+)[[:space:]]*reject.warnings ]] && reject_warnings=${BASH_REMATCH[1]} || exit 1
[[ "$out" =~ ([[:digit:]]+)[[:space:]]*held ]] && held=${BASH_REMATCH[1]} || exit 1
[[ "$out" =~ ([[:digit:]]+)[[:space:]]*discarded ]] && discarded=${BASH_REMATCH[1]} || exit 1

echo "{
  \"rejected\": $rejected,
  \"delivered\": $delivered,
  \"forwarded\": $forwarded,
  \"deferred\":$deferred,
  \"bounced\":$bounced,
  \"rejected\":$rejected,
  \"reject_warnings\":$reject_warnings,
  \"held\": $held,
  \"discarded\": $discarded
}"

thannaske commented 5 years ago

Don't know if it's already mentioned in this issue, but I've done something similar in the past: https://github.com/thannaske/rspamd-influxdb

This utilizes the rspamd web interface's statistics and is also suitable for Telegraf. Should be little more leight-weighted than the container-driver approach.

ThomDietrich commented 5 years ago

Looks good. For me the results produced by pflogsumm seem to be the best fit for our need to observe postfix. Are you (or @andryyy) interested to push this workaround into a solid solution that can be served with mailcow? I'm asking because I'm definitely willing to discuss and test but we need to come up with a proper proposal of what we are trying to accomplish.

We should go back to your initial argument and discuss a dedicated container, which serves metrics (postfix and others) via a standardized web interface.

We could also provide an argument that modifies the format of the output, e.g. /metrics?format=(influx|json|prometheus|...)

That's a good idea. For InfluxDB we'd need either json or prometheus, there is imho no need to implement the Influx line protocol. I for one would be totally fine with just a general json response. Do you specifically need the prometheus format, and if so, would you actually want to implement and maintain both?

@andryyy do you have any thoughts regarding the container?

thannaske commented 5 years ago

Prometheus format is actually the most convenient as it's understood by many kinds of metric collectors (e.g. Prometheus itself or Influx' Telegraf).

ThomDietrich commented 5 years ago

Fine with me.

Any other suggestions for next steps? Do you have an opinion on the container question? Still hoping for @andryyy to advice here :)

ThomDietrich commented 5 years ago

Hey guys, after my previous post I've extended the data amount collected and fixed a few bugs. The script now also collects data about recipients, which might be useful over time. Enjoy! Let me know if you got improvement ideas...

/opt/mailcow-dockerized/pflogsumm2influx.sh:

#!/usr/bin/env bash

set -o errexit

if [ -z "$1" ]; then echo "Please provide aggregation timespan as argument (e.g. 30m or 1h)" > /dev/stderr; exit 1; fi
# TODO: check $1 valid

out=$(docker logs --since $1 $(docker ps -qf name=postfix-mailcow) | /usr/sbin/pflogsumm)

if [[ "$out" != *"Grand Totals"* ]]; then
  echo "Unexpected output from pflogsumm. Exiting..." > /dev/stderr
  exit 1
fi

[[ "$out" =~ ([[:digit:]]+)[[:space:]]*received ]] && received=${BASH_REMATCH[1]} || exit 1
[[ "$out" =~ ([[:digit:]]+)[[:space:]]*delivered ]] && delivered=${BASH_REMATCH[1]} || exit 1
[[ "$out" =~ ([[:digit:]]+)[[:space:]]*forwarded ]] && forwarded=${BASH_REMATCH[1]} || exit 1
[[ "$out" =~ ([[:digit:]]+)[[:space:]]*deferred ]] && deferred=${BASH_REMATCH[1]} || exit 1
[[ "$out" =~ ([[:digit:]]+)[[:space:]]*bounced ]] && bounced=${BASH_REMATCH[1]} || exit 1
[[ "$out" =~ ([[:digit:]]+)[[:space:]]*rejected ]] && rejected=${BASH_REMATCH[1]} || exit 1
[[ "$out" =~ ([[:digit:]]+)[[:space:]]*reject.warnings ]] && reject_warnings=${BASH_REMATCH[1]} || exit 1
[[ "$out" =~ ([[:digit:]]+)[[:space:]]*held ]] && held=${BASH_REMATCH[1]} || exit 1
[[ "$out" =~ ([[:digit:]]+)[[:space:]]*discarded ]] && discarded=${BASH_REMATCH[1]} || exit 1

[[ "$out" =~ ([[:digit:]]+)([[:alpha:]]?)[[:space:]]*bytes.received ]] && bytes_received=${BASH_REMATCH[1]} && bytes_received_k=${BASH_REMATCH[2]} || exit 1
[[ "$out" =~ ([[:digit:]]+)([[:alpha:]]?)[[:space:]]*bytes.delivered ]] && bytes_delivered=${BASH_REMATCH[1]} && bytes_delivered_k=${BASH_REMATCH[2]} || exit 1
if [ "$bytes_received_k" == "k" ]; then bytes_received=$(( bytes_received * 1024 )); fi
if [ "$bytes_delivered_k" == "k" ]; then bytes_delivered=$(( bytes_delivered * 1024 )); fi

recipients_msg_count=$(echo "$out" | awk '/Recipients by message count/{f=1} f; /Senders by message size/{f=0}')
recipients_msg_count=$(echo "$recipients_msg_count" | sed '1,2d' | sed '$d' | sed '$d')

echo "["
echo "{
  \"name\": \"pflogsumm_messages\",
  \"received\": $received,
  \"delivered\": $delivered,
  \"forwarded\": $forwarded,
  \"deferred\": $deferred,
  \"bounced\": $bounced,
  \"rejected\": $rejected,
  \"reject_warnings\": $reject_warnings,
  \"held\": $held,
  \"discarded\": $discarded
}"
echo ",{
  \"name\": \"pflogsumm_bytes\",
  \"bytes_received\": $bytes_received,
  \"bytes_delivered\": $bytes_delivered
}"
while read -r line; do
  [[ "$line" =~ ([[:digit:]]+)[[:space:]]*(.*) ]] && count=${BASH_REMATCH[1]} && recipient=${BASH_REMATCH[2]} || exit 1
  echo ",{\"name\": \"pflogsumm_addresses\", \"received_count_recipient\": $count, \"recipient\": \"$recipient\"}"
done <<< "$recipients_msg_count"
echo "]"

My telegraf configuration /etc/telegraf/telegraf.d/exec.conf:

[[inputs.exec]]
  interval = "60m"
  commands = ["/opt/mailcow-dockerized/pflogsumm2influx.sh 60m"]
  data_format = "json"
  json_name_key = "name"
  #name_override = "pflogsumm"
  tag_keys = ["recipient"]
  timeout = "5s"

The script generates three tables/measurements:

pflogsumm_messages - The grand totals message count
pflogsumm_bytes - The grand total message bytes
pflogsumm_addresses - A tag-based count of messages per each and every recipient your server sees

These three measurements can nicely be visualized with Grafana.

menzerath commented 5 years ago

Hey there, I just wanted to let you know how I implemented a different solution using rspamd instead of postfix for data collection.

Luckily rspamd already has a working metrics collection implemented we can build upon. Unfortunately their solution only supports the Graphite backend, so we have to take a little detour using prometheus's graphite_exporter.

First enable rspamd's metrics collection by creating data/conf/rspamd/local.d/metric_exporter.conf:

backend = "graphite";

metrics = [
  "actions.add header",
  "actions.greylist",
  "actions.no action",
  "actions.reject",
  "actions.rewrite subject",
  "actions.soft reject",
  "bytes_allocated",
  "chunks_allocated",
  "chunks_freed",
  "chunks_oversized",
  "connections",
  "control_connections",
  "ham_count",
  "learned",
  "pools_allocated",
  "pools_freed",
  "scanned",
  "shared_chunks_allocated",
  "spam_count"
];

statefile = "$DBDIR/metric_exporter_last_push";
timeout = 15;
interval = 30;

host = "rspamd-graphite-exporter";
port = 9109;
metric_prefix = "rspamd";

Next deploy graphite_exporter by creating (or updating) a docker-compose.override.yml file and declaring the service itself:

version: "2.1"

services:
  rspamd-graphite-exporter:
    image: "prom/graphite-exporter"
    environment:
      - "TZ=Europe/Berlin"
    networks:
      - docker_default
      - mailcow-network

networks:
  docker_default:
    external: yes

In my specific case prometheus is deployed in a different network called docker_default which is why the container has to be attached to both networks.

Afterwards you should restart rspamd and start the graphite_exporter using

docker-compose up -d
docker-compose restart rspamd-mailcow

Finally you can edit your prometheus-configuration, add the graphite-exporter as an additional target and restart the service:

scrape_configs:
  - job_name: 'rspamd'
    static_configs:
      - targets: ['rspamd-graphite-exporter:9108']

ThomDietrich commented 4 years ago

@MarvinMenzerath thanks for the nice step-by-step description. I can see that quite a few metrics are provided - I did not yet implement it in my setup. Does your solution cover all the data points my solution provides? Does it cover messages, bytes, and addresses?? If so it would be a good idea to prepare a pull request for this solution. Best!

menzerath commented 4 years ago

@ThomDietrich I don't think that it covers all the metrics your solution is able to provide. You can count all scanned messages, but not bytes and addresses. It basically provides all the data rspamd is able to collect. I'd love to have a built-in solution in mailcow but it should cover as much data as possible, preferably collected by postfix.

ThomDietrich commented 4 years ago

Agreed. @thannaske also thought of a combination.

Following the philosophy of docker I envision a monitoring container running Telegraf as the service provider (open to discuss, imho a suited solution). We could continuously add input plugin configurations to talk to rspamd, postfix, you name it - and the user can select the output plugin they are most comfortable with (e.g. a default prometheus endpoint). This should give us the long-term flexibility to add metrics (both triggered and polled) while not locking users in on one output solution.

The whole thing could be inactive by default to not increase the attack surface for users not interested.

Stretch feature: The container can serve as the source of statistical data on the mailcow web frontend.

What do you guys think?

thannaske commented 4 years ago

In my opinion that would be indeed the best approach.

ThomDietrich commented 4 years ago

@andryyy did you guys ever decide on a prefered building style for images? Seems like every existing Dockerfile is layered on-top of alpine. For the Telegraf image we could replicate the steps from the official Dockerfile (effectively creating redundancy) or we could just derive: FROM telegraf:1.12-alpine. What do you think?

andryyy commented 4 years ago

I am too busy at the moment for this.

@ntimo what's your opinion here?

ntimo commented 4 years ago

@ThomDietrich I think we should stick to our standard and create a new container on top of alpine. And yes this is going to create redundancy but this is how it is going to work. We will need a custom setup so that we can collect the logs into Redis and display them in the webui.

I think we should then mount other volumes that are needed like postfix-vol-1 read only, to ready the postfix queues for example.

ntimo commented 4 years ago

I just checked. Telegraf looks quite nice. We can collect Postfix as well as Dovecot metrics. But I am not yet sure how to collect the Graphite metrics from rspamd @ThomDietrich can you maybe advise me on this? I would like to only have one metrics container since more of these would. We can then expose the Telegraf metrics in the Prometheus format at /metrics using the NGINX as a reverse proxy.

ThomDietrich commented 4 years ago

Hey @ntimo, new container it is. That's fine by me, just wanted to mention the topic shortly.

I also agree that we need to mount a few volumes. We should also give the container API access (did not check if local access is already whitelisted).

Regarding Postfix: Need to check but the input plugin is way inferior to my scripted solution posted further up. I'd probably go with the script, might open an issue or even PR in the telegraf repository.

Regarding Dovecot and all other services: There are most certainly ways to bring metrics from all kind of services into the container/Telegraf. After all it's just a matter of application-specific interfaces + push/pull + processing. For now I'd concentrate on the bare container (with a POC metric) as one Pull Request. After this one is discussed and merged we can start individual PRs for all interesting services.

Definitely one container.

Regarding /metrics and nginx: I'd say that depends. As I mentioned before I think we should concentrate on providing the right Telegraf inputs and allow the admin of the deployment to configure their output plugin as they need it (I, as an example, won't need a prometheus endpoint). That said: Do we want to provide a default output via /metrics? I believe we should consider that this endpoint is a security and privacy concern and many admins might not use it. I can however also see a world, in which next steps in the mailcow development world might present statistics on the web ui or even send out alerts or monthly stats mails. In this case the endpoint would be a prerequisite. I'd therefore propose to have either (1) no default output or (2) a /metrics endpoint only accessible from the docker network. Up for discussion :) @thannaske @MarvinMenzerath

ntimo commented 4 years ago

I think (2) would be a good default starting point, and yes of course we should defiantly only allow traffic from within the docker network and maybe the servers external IP.

I also think that creating the container first and properly adding it the stack is the important first step. Since this is already a lot of work. The Dockerfile needs to be created as well as the integrations in logging and changes to the webui so you can view the logs. It would be awesome if you could create the container and add it to the stack.

I will figure out a good way to expose a metrics endpoint in a private way.

I just want it to be as easy as possible for someone to get metrics out of mailcow.

ThomDietrich commented 4 years ago

@ntimo while browsing the code I realized that some containers are in fact off-the-shelf images. Namely mariadb, redis, memcached, and nginx. https://github.com/mailcow/mailcow-dockerized/blob/master/docker-compose.yml I'm quite fond of that discovery. This makes so much more sense and in light of that I'll go ahead and base my PR on telegraf:1.12-alpine (or similar).

badsmoke commented 4 years ago

hello i made an rspamd exporter for prometheus. the exporter can be downloaded via the docker hub

https://hub.docker.com/r/badsmoke/rspamd-exporter

all stats that rspam offers are shown below:


  |   | 2020-02-20 17:59:56 | rspamd_stats_fuzzy 1039819983
  |   | 2020-02-20 17:59:56 | rspamd_stats_total_learns 13
  |   | 2020-02-20 17:59:56 | rspamd_stats_fragmented 0
  |   | 2020-02-20 17:59:56 | rspamd_stats_chunks_oversized 204
  |   | 2020-02-20 17:59:56 | rspamd_stats_chunks_freed 0
  |   | 2020-02-20 17:59:56 | rspamd_stats_chunks_allocated 330
  |   | 2020-02-20 17:59:56 | rspamd_stats_bytes_allocated 29346696
  |   | 2020-02-20 17:59:56 | rspamd_stats_pools_freed 43468
  |   | 2020-02-20 17:59:56 | rspamd_stats_pools_allocated 43507
  |   | 2020-02-20 17:59:56 | rspamd_stats_control_connections 6
  |   | 2020-02-20 17:59:56 | rspamd_stats_connections 712
  |   | 2020-02-20 17:59:56 | rspamd_stats_ham_count 44110
  |   | 2020-02-20 17:59:56 | rspamd_stats_spam_count 80
  |   | 2020-02-20 17:59:56 | rspamd_stats_learned 13
  |   | 2020-02-20 17:59:56 | rspamd_stats_scanned 44190
  |   | 2020-02-20 17:59:56 | rspamd_actions_no_action 43990
  |   | 2020-02-20 17:59:56 | rspamd_actions_greylist 120
  |   | 2020-02-20 17:59:56 | rspamd_actions_add_header 54
  |   | 2020-02-20 17:59:56 | rspamd_actions_rewrite_subject 0
  |   | 2020-02-20 17:59:56 | rspamd_actions_soft_reject 0
  |   | 2020-02-20 17:59:56 | rspamd_actions_reject 26

ThomDietrich commented 4 years ago

@badsmoke where can one find your Dockerfile?

badsmoke commented 4 years ago

i have updated the Dockerfile in the Dockerhub Readme

badsmoke commented 4 years ago

i adapted in the readme what is exported. Everything rspamd metricen: e.g.


{
  "read_only": false,
  "scanned": 46490,
  "learned": 35,
  "actions": {
    "reject": 27,
    "soft reject": 0,
    "rewrite subject": 0,
    "add header": 57,
    "greylist": 264,
    "no action": 46142
  },
  "spam_count": 84,
  "ham_count": 46406,
  "connections": 233,
  "control_connections": 33154,
  "pools_allocated": 63594,
  "pools_freed": 63554,
  "bytes_allocated": 31461256,
  "chunks_allocated": 320,
  "shared_chunks_allocated": 85,
  "chunks_freed": 0,
  "chunks_oversized": 235,
  "fragmented": 0,
  "total_learns": 35,
  "statfiles": [
    {
      "revision": 4,
      "used": 0,
      "total": 0,
      "size": 0,
      "symbol": "BAYES_SPAM",
      "type": "redis",
      "languages": 0,
      "users": 1
    },
    {
      "revision": 31,
      "used": 0,
      "total": 0,
      "size": 0,
      "symbol": "BAYES_HAM",
      "type": "redis",
      "languages": 0,
      "users": 1
    }
  ],
  "fuzzy_hashes": {
    "local": 15,
    "rspamd.com": 1075751952,
    "mailcow": 11951
  }
}

prometheus values

rspamd_actions_reject 27
rspamd_actions_soft_reject 0
rspamd_actions_rewrite_subject 0
rspamd_actions_add_header 57
rspamd_actions_greylist 264
rspamd_actions_no_action 46142
rspamd_stats_scanned 46490
rspamd_stats_learned 35
rspamd_stats_spam_count 84
rspamd_stats_ham_count 46406
rspamd_stats_connections 233
rspamd_stats_control_connections 33191
rspamd_stats_pools_allocated 63662
rspamd_stats_pools_freed 63622
rspamd_stats_bytes_allocated 31463624
rspamd_stats_chunks_allocated 320
rspamd_stats_chunks_freed 0
rspamd_stats_chunks_oversized 235
rspamd_stats_fragmented 0
rspamd_stats_total_learns 35
rspamd_stats_fuzzy 1075756039

j6s commented 4 years ago

Being very interested in collecting metrics into prometheus, I also started building a small exporter that fetches information from the API. It's not a lot of information, but at least the basics (such as number of messages per mailbox) are there:

https://github.com/j6s/mailcow-exporter

andryyy commented 4 years ago

Nice! Thank you!

andryyy commented 4 years ago

Would it help to create an api endpoint for Rspamd stats?

j6s commented 4 years ago

Would it help to create an api endpoint for Rspamd stats?

For mailcow-exporter, absolutely yes.

patschi commented 4 years ago

I also started building a small exporter

That's brilliant!!

I've also joined the "Prometheus-fan-club" a few days ago, so that's indeed great timing. Metrics from nginx (server_status) might be also useful, also wondering what kind of data dovecot/postfix might deliver for building metrics.

(Excuse me. Accidentally held SHIFT while sending this comment...)

andryyy commented 4 years ago

api/v1/get/logs/rspamd-stats is implemented now. :) Can you add it to the docs, @ntimo ?

ntimo commented 4 years ago

@andryyy of course will do 👍

ThomDietrich commented 4 years ago

Far too long ago we decided to create a new docker image for monitoring. https://github.com/mailcow/mailcow-dockerized/issues/1695#issuecomment-549867744

I did not yet have time to work on this since. Is that still a good way forward? @j6s how does your exporter fit into the overall goal to collect comprehensive mailcow metrics? From the looks of it, with the rspamd-stats endpoint the exporter seems pretty close already

mrueg commented 4 years ago

FYI I've filed https://github.com/rspamd/rspamd/issues/3484

Upstream seems to be open to a accept a pr that expands the stats endpoint to expose prometheus metrics.

dragoangel commented 3 years ago

J6s implementation is good as it reuse all api futures and I think it will be easy to maintain.

I will also try dig in this.

kozicpetar commented 3 years ago

Hi guys, I think here for monitoring is much important postfix metrics like have in pflogsumm because that metrics in realation with Prometheus metrics can alert us if something goes wrong.

Is someone tried to do that?

mrueg commented 3 years ago

Hi guys, I think here for monitoring is much important postfix metrics like have in pflogsumm because that metrics in realation with Prometheus metrics can alert us if something goes wrong.

Is someone tried to do that?

You can take a look at https://github.com/google/mtail/blob/master/examples/postfix.mtail

kozicpetar commented 3 years ago

@mrueg thank you for this. I'm not sure how can I expose metrics on host that mtail can read it.

In mailcow documentation I found this:

For Syslog:

{
...
  "log-driver": "syslog",
  "log-opts": {
    "syslog-address": "udp://1.2.3.4:514"
  }
...
}

Thanks.

kozicpetar commented 3 years ago

@mrueg I sorted. Thank you.

markg85 commented 1 year ago

Hi,

Sorry to revive an old thread!

I'm curious if there is an recommended way to collect mailcow metrics? Going the custom docker compose route can work, but i'm not a big fan of that. Sooner or later that will break. Thus a solution directly from the mailcow devs would be super helpful!

irrwitzer42 commented 8 months ago

Hi,

As I do all my monitoring with prometheus, I stumbled upon this thread a while ago and took the information to get as far as possible. To maybe provide some starting point for other prometheus fans, I "documented" my current solution and provided my mailcow dashboard + an other postfix dashboard (I don't remember the origin of) here: https://github.com/irrwitzer42/grafana-mailcow

I'm currently running the nightly version on arm64, so experience might differ for stable users. Also I don't get timestamps for last login - but added a table for it nevertheless.

ATM the mailcow dashboard doesn't use metrics from the postfix-exporter, while the postfix dashboard doesn't use metrics from the mailcow exporter. That's currently by design, but might change in the future.

If somebody has a better dashboard for mailcow+prometheus, please let me know ;-)

mailcow / mailcow-dockerized

[Feature Proposal] Prometheus Endpoint, etc. #1695