Monitoring - Githubissues

wj-turner / OmegaFin

This open-source project is designed for gathering, processing, and analyzing financial data.

5 stars 2 forks source link

For monitoring a containerized application, especially in a development environment, you have several great options. I'll suggest Prometheus in combination with Grafana, as they are popular, robust, and well-integrated solutions for monitoring and visualization.

Prometheus: It's an open-source monitoring solution that can scrape metrics from different sources. In a Docker environment, the combination of Prometheus with cAdvisor and node-exporter is common:
- cAdvisor (Container Advisor): Provides container users with an understanding of the resource usage and performance characteristics of their running containers.
- node-exporter: Exposes a wide variety of hardware and OS metrics.
Grafana: A platform for monitoring and observability. It lets you query, visualize, alert on, and understand your metrics no matter where they are stored.

Here's a basic integration guide:

Docker Compose Configuration

In your docker-compose.yml, you can add the following:

services:
  ...

  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus:/etc/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
    ports:
      - 9090:9090
    depends_on:
      - cadvisor

  cadvisor:
    image: google/cadvisor:latest
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:rw
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    ports:
      - 8080:8080

  node-exporter:
    image: prom/node-exporter:latest
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)'

  grafana:
    image: grafana/grafana:latest
    ports:
      - 3000:3000
    depends_on:
      - prometheus

Prometheus Configuration

You'll need a prometheus.yml config in your project directory. Here's a basic example:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'cadvisor'
    scrape_interval: 5s
    static_configs:
      - targets: ['cadvisor:8080']

  - job_name: 'node-exporter'
    scrape_interval: 5s
    static_configs:
      - targets: ['node-exporter:9100']

Monitoring Your Services

If your services (like FastAPI) expose Prometheus metrics (usually at /metrics endpoint), you can add them to the prometheus.yml under scrape_configs.

Visualization with Grafana

Once everything is running:

Open Grafana at http://localhost:3000.
Login with admin/admin (default credentials).
Add Prometheus as a data source.
Start creating dashboards or import existing ones to visualize your metrics.

Conclusion

This basic setup gives you a robust monitoring solution for a development environment. For production, considerations for secure access, data retention, alerts, and scalability should be made.

Prometheus, in combination with Grafana, is primarily used to monitor and visualize metrics, including but not limited to:

Infrastructure Monitoring:
- System Metrics: CPU usage, memory usage, disk I/O, network I/O, and more from individual machines or nodes.
- Container Metrics: If you're using Docker or Kubernetes, you can monitor metrics specific to container instances such as CPU, memory allocation, and network stats.
- Service Availability: Ensure services are running and responsive.
- Resource Quotas: Monitor resources to ensure they're within desired limits.
Application Monitoring:
- Request Rates: The number of requests your application is serving over time.
- Error Rates: Track the rate of errors your application is encountering.
- Response Times: Monitor the latency or time it takes for your application to respond to requests.
- Custom Metrics: With client libraries provided by Prometheus, developers can instrument their code to expose custom metrics specific to their applications. For example, in an e-commerce system, you could track the number of items added to carts, number of checkouts, or number of payment failures.
Database Monitoring:
- Metrics related to query execution times, active connections, cache hit ratios, replication lag, and many others specific to database systems.
External Systems Monitoring:
- Monitor external systems like cache stores (e.g., Redis), message brokers (e.g., Kafka), or any third-party systems that expose Prometheus compatible metrics.
Alerts:
- Prometheus allows you to define alerting rules based on your metrics. If a certain condition is met (like CPU usage being too high for a sustained period), Prometheus can send alerts to various handlers (like Alertmanager, which can then route the alert to email, Slack, PagerDuty, etc.).
Service Discovery:
- Prometheus supports various service discovery mechanisms which can be used to dynamically discover targets in cloud environments.
Business Metrics:
- While Prometheus is typically used for technical metrics, nothing prevents teams from using it to track business-related metrics, like sign-ups, purchases, user engagement, and more.

Grafana then provides a powerful visualization layer on top of Prometheus:

Dashboards: Create comprehensive dashboards that give you an overview of your infrastructure, applications, databases, and other services.
Visualizations: Variety of chart types like line charts, histograms, heatmaps, gauges, and more.
Annotations: Mark important events on your graphs (e.g., a code deploy).
Alerting (in Grafana): While Prometheus has its alerting mechanism, Grafana also provides its own alerting features which can be used based on the visualized data.
Integrations: Grafana isn't limited to just Prometheus. It can pull in data from various other data sources like ElasticSearch, InfluxDB, AWS CloudWatch, and more.

In essence, the combination of Prometheus for metrics collection and Grafana for visualization provides a holistic view of your infrastructure's health, performance, and behavior. It helps in troubleshooting, performance tuning, and ensuring system reliability.