kokoichi206 / routines

0 stars 0 forks source link

datadog 触ってみる #18

Open kokoichi206 opened 1 year ago

kokoichi206 commented 1 year ago

How to Install Agent

https://us3.datadoghq.com/signup/agent#ubuntu

アカウント登録後、以下の指示が出るので ラズパイで叩いてみる

DD_API_KEY=my_api_key_secretyo DD_SITE="us3.datadoghq.com" bash -c "$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script_agent7.sh)"
Screenshot 2023-01-25 at 12 24 09
* Adding your API key to the Datadog Agent configuration: /etc/datadog-agent/datadog.yaml

* Setting SITE in the Datadog Agent configuration: /etc/datadog-agent/datadog.yaml

/usr/bin/systemctl
* Starting the Datadog Agent...

  Your Datadog Agent is running and functioning properly.
  It will continue to run in the background and submit metrics to Datadog.
  If you ever want to stop the Datadog Agent, run:

      sudo systemctl stop datadog-agent

  And to run it again run:

      sudo systemctl start datadog-agent
kokoichi206 commented 1 year ago

Apache を有効にする

mod なんとかは基本有効になってるはず

最初 apache.d とか書かれてた時 apache 側の話かと思ってたが、datadog-agent のフォルダの中での話だった

# ここ!
cd /etc/datadog-agent/conf.d/apache.d
sudo cp -p conf.yaml.example conf.yaml
sudo systemctl restart datadog-agent.service
kokoichi206 commented 1 year ago

ログを有効にする

https://us3.datadoghq.com/logs/onboarding/detected

/etc/datadog-agent$ sudo vim datadog.yaml
sudo systemctl restart datadog-agent.service

上の URL でうまくいかない時は、log ファイルのパーミッションを変更してみる。

sudo chmod 655 /var/log/apache2/ -R

下のように、ログにパーミッションエラーが出ている。

datadog agent のログを見てみる

sudo tail -f /var/log/datadog/agent.log

$ cat /var/log/datadog/agent.log | grep permission
2023-01-25 04:00:32 UTC | CORE | WARN | (pkg/logs/internal/launchers/file/launcher.go:241 in launchTailers) | Could not collect files: cannot read file /var/log/apache2/access.log: stat /var/log/apache2/access.log: permission denied
2023-01-25 04:00:32 UTC | CORE | WARN | (pkg/logs/internal/launchers/file/launcher.go:241 in launchTailers) | Could not collect files: cannot read file /var/log/apache2/error.log: stat /var/log/apache2/error.log: permission denied

リンク

kokoichi206 commented 1 year ago

Logs

Index: main storage, db. Source: ddsource

kokoichi206 commented 1 year ago

dd-trace-go

API レファレンス https://docs.datadoghq.com/ja/api/latest/

DD-API-KEY

Organization Settings > ACCESS >API Keys

kokoichi206 commented 1 year ago

OpenTracing

The OpenTracing API

https://github.com/opentracing/specification/blob/master/specification.md#the-opentracing-api

The OpenTrarcing Data Model

https://github.com/opentracing/specification/blob/master/specification.md#the-opentracing-data-model

kokoichi206 commented 1 year ago

Agent.Log

気づいたらすごい頻度でログが書き込まれていた。

2023-01-25 23:30:13 UTC | CORE | WARN | (pkg/collector/python/datadog_agent.go:125 in LogMessage) | disk:67cc0574430a16ba | (disk.py:135) | Unable to get disk metrics for /sys/kernel/debug/tracing: [Errno 13] Permission denied: '/sys/kernel/debug/tracing'. You can exclude this mountpoint in the settings if it is invalid.
2023-01-25 23:30:28 UTC | CORE | WARN | (pkg/collector/python/datadog_agent.go:125 in LogMessage) | disk:67cc0574430a16ba | (disk.py:135) | Unable to get disk metrics for /run/user/1000/gvfs: [Errno 13] Permission denied: '/run/user/1000/gvfs'. You can exclude this mountpoint in the settings if it is invalid.
2023-01-25 23:30:28 UTC | CORE | WARN | (pkg/collector/python/datadog_agent.go:125 in LogMessage) | disk:67cc0574430a16ba | (disk.py:135) | Unable to get disk metrics for /sys/kernel/debug/tracing: [Errno 13] Permission denied: '/sys/kernel/debug/tracing'. You can exclude this mountpoint in the settings if it is invalid.
2023-01-25 23:30:29 UTC | CORE | WARN | (pkg/collector/corechecks/containers/docker/check.go:220 in runDockerCustom) | Unable to fetch tags for container: sha256:d6c21fcb8fc9611b222ec23b881e75b0b6f584389e57717a69c96d382bd52c69, err: invalid image name (is a sha256)
2023-01-25 23:30:29 UTC | CORE | WARN | (pkg/collector/corechecks/containers/docker/check.go:220 in runDockerCustom) | Unable to fetch tags for container: sha256:8783247e0de113f13e0feb6a338e34ef5b8423c756e337009d88c3b2423c5744, err: invalid image name (is a sha256)
2023-01-25 23:30:29 UTC | CORE | WARN | (pkg/collector/corechecks/containers/docker/check.go:220 in runDockerCustom) | Unable to fetch tags for container: sha256:54932d1e2b576170944902535c58a16fed7a2a4d9aaabf7fceae5fd39619b750, err: invalid image name (is a sha256)
2023-01-25 23:30:29 UTC | CORE | WARN | (pkg/collector/corechecks/containers/docker/check.go:220 in runDockerCustom) | Unable to fetch tags for container: sha256:18e13cfe20ac90eab3ac026ae7cc6120eb278ab50dab9c1f3852a4deaa037aa6, err: invalid image name (is a sha256)

Docker のコンテナ関連っぽい?ので1回止めてみるか

kokoichi206 commented 1 year ago

Log Rotation の結果 access.log のパーミッションが元に戻る問題

/etc/logrotate.d/apache2 を編集することで対応する。

$ sudo cat /etc/logrotate.d/apache2 
/var/log/apache2/*.log {
    daily
    missingok
    rotate 14
    compress
    delaycompress
    notifempty
    create 640 root adm
    sharedscripts
    prerotate
        if [ -d /etc/logrotate.d/httpd-prerotate ]; then
            run-parts /etc/logrotate.d/httpd-prerotate
        fi
    endscript
    postrotate
        if pgrep -f ^/usr/sbin/apache2 > /dev/null; then
            invoke-rc.d apache2 reload 2>&1 | logger -t apache2.logrotate
        fi
    endscript
}

create 640 root admcreate 644 www-data www-data にする

kokoichi206 commented 1 year ago

Links

kokoichi206 commented 1 year ago

dd-trace-go からデータが送れない!?

# If necessary, prepend sudo -u dd-agent to the install command.

sudo -u dd-agent datadog-agent integration install -t datadog-go-pprof-scraper==1.0.2
Screenshot 2023-01-26 at 21 51 33
/etc/datadog-agent/conf.d/go_pprof_scraper.d$ cat conf.yaml
## All options defined here are available to all instances.
#
init_config:

    ## @param service - string - optional
    ## Attach the tag `service:<SERVICE>` to every metric, event, and service check emitted by this integration.
    ##
    ## Additionally, this sets the default `service` for every log source.
    #
    # service: <SERVICE>

## Every instance is scheduled independently of the others.
#
instances:

  -
    ## @param env - string - optional - default: prod
    ## env tag to apply to uploaded profiles ("env:<ENV>")
    #
    # env: prod

    ## @param pprof_url - string - required
    ## URL of the /debug/pprof endpoint to collect
    #
    pprof_url: http://myservice:1234/debug/pprof/

    ## @param duration - integer - optional - default: 60
    ## Duration of profiles, in seconds
    #
    # duration: 30

    ## @param profiles - list of strings - optional
    ## List of profiles to collect. Valid options are "cpu", "heap", "mutex", "block", and "goroutine"
    #
    # profiles:
    #   - cpu
    #   - heap

    ## @param cumulative - boolean - optional - default: true
    ## Whether to collect heap, mutex, or block profiles as cumulative profiles
    ## since the program started. If false, requests those profiles over the
    ## period specified by "duration". The profiles will hold the difference
    ## between the samples at the beginning and end of profiling.
    ##
    ## For the heap profile, the in-use (also known as "live heap") samples
    ## may be negative if "cumulative" is false. This does not display
    ## accurately in the profile UI, so Datadog does not recommend setting
    ## "cumulative" to false.
    ##
    ## In order to use profile aggregation, "cumulative" must set to false.
    ## Note that setting "cumulative" to false will cause the profiled
    ## application to use more memory in order to compute the profiles.
    #
    # cumulative: true

    ## @param tags - list of strings - optional
    ## A list of tags to attach to every metric and service check emitted by this instance.
    ##
    ## Learn more about tagging at https://docs.datadoghq.com/tagging
    #
    # tags:
    #   - <KEY_1>:<VALUE_1>
    #   - <KEY_2>:<VALUE_2>

    ## @param service - string - required
    ## Service name to tag on every profile uploaded for this instance.
    ##
    ## Overrides any `service` defined in the `init_config` section.
    #
    service: default-go-service

    ## @param min_collection_interval - number - optional - default: 1
    ## This changes the collection interval of the check. For more information, see:
    ## https://docs.datadoghq.com/developers/write_agent_check/#collection-interval
    ##
    ## This is a long-running check, and is intended to be started again as
    ## soon as it finishes. Setting this to a larger value will cause longer
    ## pauses between iterations of this check.
    ##
    ## If omitted, will default to 15 seconds.
    #
    min_collection_interval: 1

    ## @param empty_default_hostname - boolean - optional - default: false
    ## This forces the check to send metrics with no hostname.
    ##
    ## This is useful for cluster-level checks.
    #
    # empty_default_hostname: false

    ## @param metric_patterns - mapping - optional
    ## A mapping of metrics to include or exclude, with each entry being a regular expression.
    ##
    ## Metrics defined in `exclude` will take precedence in case of overlap.
    #
    # metric_patterns:
    #   include:
    #   - <INCLUDE_REGEX>
    #   exclude:
    #   - <EXCLUDE_REGEX>
kokoichi206 commented 1 year ago

ばり長いエラーが agent.log に出てた

2023-01-26 13:26:03 UTC | CORE | ERROR | (pkg/collector/worker/check_logger.go:69 in Error) | 
check:go_pprof_scraper | Error running check: [{"message": "HTTPConnectionPool(host='myservice', port=1234): 
Max retries exceeded with url: /debug/pprof/heap (Caused by NewConnectionError('<urllib3.connection.HTTPConnection
 object at 0xffff4815d4c0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))", 
"traceback": "Traceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-
packages/urllib3/connection.py\", line 174, in _new_conn\n    conn = connection.create_connection(\n  File \"/opt/datadog-
agent/embedded/lib/python3.8/site-packages/urllib3/util/connection.py\", line 72, in create_connection\n    for res in 
socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):\n  File \"/opt/datadog-
agent/embedded/lib/python3.8/socket.py\", line 918, in getaddrinfo\n    for res in _socket.getaddrinfo(host, port, family, 
type, proto, flags):\nsocket.gaierror: [Errno -3] Temporary failure in name resolution\n\nDuring handling of the above 
exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/opt/datadog-
agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py\", line 703, in urlopen\n    httplib_response = 
self._make_request(\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py\", line
 398, in _make_request\n    conn.request(method, url, **httplib_request_kw)\n  File \"/opt/datadog-
agent/embedded/lib/python3.8/site-packages/urllib3/connection.py\", line 239, in request\n    super(HTTPConnection, 
self).request(method, url, body=body, headers=headers)\n  File \"/opt/datadog-
agent/embedded/lib/python3.8/http/client.py\", line 1256, in request\n    self._send_request(method, url, body, headers, encode_chunked)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/http/client.py\", line 1302, in _send_request\n    self.endheaders(body, encode_chunked=encode_chunked)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/http/client.py\", line 1251, in endheaders\n    self._send_output(message_body, encode_chunked=encode_chunked)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/http/client.py\", line 1011, in _send_output\n    self.send(msg)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/http/client.py\", line 951, in send\n    self.connect()\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py\", line 205, in connect\n    conn = self._new_conn()\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connection.py\", line 186, in _new_conn\n    raise NewConnectionError(\nurllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 
0xffff4815d4c0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/adapters.py\", line 489, in send\n    resp = conn.urlopen(\n  File 
\"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/connectionpool.py\", line 787, in urlopen\n    retries = retries.increment(\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/urllib3/util/retry.py\", line 592, in increment\n    raise MaxRetryError(_pool, url, error or ResponseError(cause))\nurllib3.exceptions.MaxRetryError: 
HTTPConnectionPool(host='myservice', port=1234): Max retries exceeded with url: /debug/pprof/heap (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0xffff4815d4c0>: Failed to establish a new 
connection: [Errno -3] Temporary failure in name resolution'))\n\nDuring handling of the above exception, another 
exception occurred:\n\nTraceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-
packages/datadog_checks/base/checks/base.py\", line 1122, in run\n    self.check(instance)\n  File \"/opt/datadog-
agent/embedded/lib/python3.8/site-packages/datadog_checks/go_pprof_scraper/check.py\", line 139, in check\n    
profiles = list(executor.map(self._get_profile, self.profiles))\n  File \"/opt/datadog-
agent/embedded/lib/python3.8/concurrent/futures/_base.py\", line 619, in result_iterator\n    yield fs.pop().result()\n  File 
\"/opt/datadog-agent/embedded/lib/python3.8/concurrent/futures/_base.py\", line 444, in result\n    return 
self.__get_result()\n  File \"/opt/datadog-agent/embedded/lib/python3.8/concurrent/futures/_base.py\", line 389, in 
__get_result\n    raise self._exception\n  File \"/opt/datadog-agent/embedded/lib/python3.8/concurrent/futures/thread.py\", 
line 57, in run\n    result = self.fn(*self.args, **self.kwargs)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-
packages/datadog_checks/go_pprof_scraper/check.py\", line 108, in _get_profile\n    response = self.http.get(\n  File 
\"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py\", line 355, in get\n    
return self._request('get', url, options)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-
packages/datadog_checks/base/utils/http.py\", line 419, in _request\n    response = 
self.make_request_aia_chasing(request_method, method, url, new_options, persist)\n  File \"/opt/datadog-
agent/embedded/lib/python3.8/site-packages/datadog_checks/base/utils/http.py\", line 425, in 
make_request_aia_chasing\n    response = request_method(url, **new_options)\n  File \"/opt/datadog-
agent/embedded/lib/python3.8/site-packages/requests/api.py\", line 73, in get\n    return request(\"get\", url, 
params=params, **kwargs)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/api.py\", line 59, 
in request\n    return session.request(method=method, url=url, **kwargs)\n  File \"/opt/datadog-
agent/embedded/lib/python3.8/site-packages/requests/sessions.py\", line 587, in request\n    resp = self.send(prep, 
**send_kwargs)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/requests/sessions.py\", line 701, in 
send\n    r = adapter.send(request, **kwargs)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-
packages/requests/adapters.py\", line 565, in send\n    raise ConnectionError(e, 
request=request)\nrequests.exceptions.ConnectionError: HTTPConnectionPool(host='myservice', port=1234): Max retries 
exceeded with url: /debug/pprof/heap (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 
0xffff4815d4c0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))\n"}]
kokoichi206 commented 1 year ago
2023-01-26 13:43:41 UTC | CORE | WARN | (pkg/collector/python/datadog_agent.go:125 in LogMessage) | 
disk:67cc0574430a16ba | (disk.py:135) | Unable to get disk metrics for /sys/kernel/debug/tracing: [Errno 13] Permission
 denied: '/sys/kernel/debug/tracing'. You can exclude this mountpoint in the settings if it is invalid.
kokoichi206 commented 1 year ago

APM: Application Performance Management

なんかずっとログのところ見てたけど、APM のところかもしれない。。。

Screenshot 2023-01-26 at 23 17 35
kokoichi206 commented 1 year ago

Go Log

https://docs.datadoghq.com/ja/logs/log_collection/go/

write-your-logs-to-a-file

https://www.datadoghq.com/ja/blog/go-logging/#write-your-logs-to-a-file

/usr/log/api$ ls -la
total 8
drwxr-xr-x 2 root   root   4096 Jan 27 03:30 .
drwxr-xr-x 3 root   root   4096 Jan 27 03:29 ..
-rw-r--r-- 1 ubuntu ubuntu    0 Jan 27 03:30 test.log

疑問

conf.d の d ってなんだ

conf, confd とかだと、デーモンの略って可能性もある https://teratail.com/questions/2920

デーモンかディレクトリかな

送れた気がする

Screenshot 2023-01-27 at 13 21 12 Screenshot 2023-01-27 at 13 23 20
kokoichi206 commented 1 year ago
sudo vim /etc/datadog-agent/conf.d/go.d/conf.yaml

sudo systemctl start datadog-agent
kokoichi206 commented 1 year ago

apache で /server-status へのログが溜まりすぎてる問題

1 週間くらいで 3 万ログとか吸い上げてしまってる

Screenshot 2023-02-01 at 10 02 22

とりあえず apache のログに残さないようにしてみる

apache 側での対応

# とりあえず /server-status から始まるものをはじきたい
$ sudo vim /etc/apache2/apache2.conf

LogFormat "%h %l %u %t \"%r\" %>s %O" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined

# https://httpd.apache.org/docs/2.2/env.html#page-header
SetEnvIf Request_URI "^/server-status" dontlog
ErrorLog ${APACHE_LOG_DIR}/error.log
CustomLog ${APACHE_LOG_DIR}/access.log common env=!dontlog

$ sudo systemctl restart apache2

datadog で吸い上げる時の対応