matomo-org / matomo-log-analytics

Import any kind of server logs in Matomo for powerful log analytics. Universal log file parsing and reporting.
https://matomo.org/log-analytics/
GNU General Public License v3.0
225 stars 118 forks source link

Log parser not streaming from stderin #282

Open matt9mg opened 4 years ago

matt9mg commented 4 years ago

It would seem running the below code from the documentation doesn't seem to pass anything into stderin.

This is placed in the Vhost

LogFormat "%v %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" matomoLogFormat
  CustomLog ${APACHE_LOG_DIR}/matomo_input.log matomoLogFormat

  CustomLog "||/var/www/html/import_logs.py \
    --debug --enable-http-errors --enable-http-redirects --enable-bots \
    --url=http://XXXXXXXXXX --output=${APACHE_LOG_DIR}/matomo.log --recorders=1 \
    --recorder-max-payload-size=1 --token-auth=XXXXXXXXXXXXXXXXXXXXXXXXXXXX --idsite=3 --log-format-name=common_complete -" matomoLogFormat

output is from the log file

2020-10-26 16:30:24,118: [DEBUG] Accepted hostnames: all
2020-10-26 16:30:24,120: [DEBUG] Matomo Tracker API URL is: http://XXXXXXXXXX
2020-10-26 16:30:24,121: [DEBUG] Matomo Analytics API URL is: http://XXXXXXXXXX
2020-10-26 16:30:24,122: [DEBUG] Authentication token token_auth is: XXXXXXXXXXXXXX
2020-10-26 16:30:24,123: [DEBUG] Resolver: static
2020-10-26 16:30:24,201: [DEBUG] Launched recorder

Logs import summary
-------------------

    0 requests imported successfully
    0 requests were downloads
    0 requests ignored:
        0 HTTP errors
        0 HTTP redirects
        0 invalid log lines
        0 filtered log lines
        0 requests did not match any known site
        0 requests did not match any --hostname
        0 requests done by bots, search engines...
        0 requests to static resources (css, js, images, ico, ttf...)
        0 requests to file downloads did not match any --download-extensions

Website import summary
----------------------

    0 requests imported to 1 sites
        1 sites already existed
        0 sites were created:

    0 distinct hostnames did not match any existing site:

Performance summary
-------------------

    Total time: 0 seconds
    Requests imported per second: 0.0 requests per second

Processing your log data
------------------------

    In order for your logs to be processed by Matomo, you may need to run the following command:
     ./console core:archive --force-all-websites --force-all-periods=315576000 --force-date-last-n=1000 --url='http://XXXXXXX'

But if I run this command via a cron from a file it works as expected and I see the log output is sending this information to my matomo instance.

/var/www/html/import_logs.py --debug --enable-http-errors --enable-http-redirects --enable-bots --url=http://XXXXX --output=/var/log/apache2/matomo.log --recorders=1 --recorder-max-payload-size=1 --token-auth=XXXXXXXXXX --idsite=3 --log-format-name=common_complete /var/log/apache2/site.log

matt9mg commented 4 years ago

checking the error.log file I can see this.

AH00106: piped log program '/var/www/html/import_logs.py --debug --enable-http-errors --enable-http-redirects --enable-bots --url=http://XXXXXXXXXX --output=/var/log/apache2/matomo.log --recorders=1 --recorder-max-payload-size=1 --token-auth=XXXXXXXXXXXXXXXX --idsite=3 --log-format-name=common_complete -' failed unexpectedly

But checking /var/log/apache2/matomo.log i just get the same old output above.

Findus23 commented 4 years ago

Hi,

I don't know Apache at all, so I can't help here. But just for your information: This section of the README was written 8 years ago, so it is not impossible that it won't work that way any more. If you find out more it would be great if you could create a PR that fixes it or if it turns out to not work at all, removes it.

matt9mg commented 4 years ago

Running the above in the apache vhost using a php script manages to grab the stdin.

<?php

$stdin = fopen('php://stdin', 'rb');
ob_implicit_flush(true);
while ($line = fgets($stdin)) {
    $line = trim($line);
    file_put_contents(__DIR__ . '/tmp.txt', print_r($line, true), FILE_APPEND);
}

But adding some debug logging into the .py script shows nothing is being passed to python. My python skills aren't that strong so may need someone else to help with this as its a big requirement.

matt9mg commented 4 years ago

Using the latest version supplied in this repository (which is different from the matomo application you download from the website) doesn't work at all

AH00106: piped log program '/usr/bin/python3 /var/www/html/import_logs.py --debug --enable-http-errors --enable-http-redirects --enable-bots --url=http://XXXXXXXX --output=/var/log/apache2/matomo.log --recorders=1 --recorder-max-payload-size=1 --token-auth=XXXXXXXXXXXXXXX --idsite=3 --log-format-name=common_complete -' failed unexpectedly
Traceback (most recent call last):
  File "/var/www/html/import_logs.py", line 2661, in <module>
    config = Configuration()
  File "/var/www/html/import_logs.py", line 1024, in __init__
    self._parse_args(self._create_parser(), argv)
  File "/var/www/html/import_logs.py", line 934, in _parse_args
    sys.stdout = sys.stderr = open(self.options.output, 'a+', 0)
ValueError: can't have unbuffered text I/O

But does support my theory that the buffer is empty which is passed to python.

keykey7 commented 3 years ago

similar issue here: the --output option seems bugged. dropping it and redirecting stdout manually helped in my case.

AdUser commented 3 years ago

Using the latest version supplied in this repository (which is different from the matomo application you download from the website) doesn't work at all

AH00106: piped log program '/usr/bin/python3 /var/www/html/import_logs.py --debug --enable-http-errors --enable-http-redirects --enable-bots --url=http://XXXXXXXX --output=/var/log/apache2/matomo.log --recorders=1 --recorder-max-payload-size=1 --token-auth=XXXXXXXXXXXXXXX --idsite=3 --log-format-name=common_complete -' failed unexpectedly
Traceback (most recent call last):
  File "/var/www/html/import_logs.py", line 2661, in <module>
    config = Configuration()
  File "/var/www/html/import_logs.py", line 1024, in __init__
    self._parse_args(self._create_parser(), argv)
  File "/var/www/html/import_logs.py", line 934, in _parse_args
    sys.stdout = sys.stderr = open(self.options.output, 'a+', 0)
ValueError: can't have unbuffered text I/O

But does support my theory that the buffer is empty which is passed to python.

It's python3 migration issue: offset isn't applicable in "text-mode" i/o. Can be fixed with small patch:

        if self.options.output:
-            sys.stdout = sys.stderr = open(self.options.output, 'a+', 0)
+            sys.stdout = sys.stderr = open(self.options.output, 'a+')
sgiehl commented 3 years ago

@AdUser would you mind creating a small PR for that, so someone from the team can review and merge that? Thanks.