Open hluaces opened 3 years ago
@hluaces Amazing job writing this issue! Providing a dockerfile recreating the issue is super useful and fantastic, I really appreciate it. I was able to re-create the issue locally as well, and the issue is right here: https://github.com/bmatcuk/doublestar/blob/v3/doublestar.go#L441 the error for opening the directory path isn't returned. Although the code (https://github.com/influxdata/telegraf/blob/master/internal/globpath/globpath.go#L53) isn't checking the error either if even it did.
Looks like the doublestar library recently went through a big overhaul and now has a version 4 available, it might be handled in this new version. I can play around with it and see if it fixes the problem.
Are there any news on this?
I'm not trying to pressure or anything, just trying to manage expectations with this bug as its presence forces me to do some ugly workarounds. Thank you for your time.
Thanks for reminding me about this I honestly forgot, I'm afraid no news as I haven't made progress with updating the doublestar library. I will make a note to look at it this week and get back to you, hopefully it will resolve the issue.
edit: I remember now what stopped me, v4 depends on io/fs and therefore required Go v1.16+. We will be moving away from v1.15 after this pr is merged: https://github.com/influxdata/telegraf/pull/9642. I will go ahead and get a pr ready to make it sure it fixes this issue.
Upgrading doublestar to v4 is trickier then I thought, it depends on the new io/fs
package that isn't straightforward to use and causes more changes then I'd expect.
@hluaces I have a draft pr that does address this issue, if you have time can you try out the artifacts to see if it works for you? You can find them posted by the telegraf bot here. This change does depend on the pull request to the project doublestar to accepted and merged: https://github.com/bmatcuk/doublestar/pull/57 at the moment the draft pr uses my forked repo.
To test the changes I updated the Dockerfile you provided to copy a local telegraf binary:
FROM centos:7
COPY files/influxdb.repo /etc/yum.repos.d
RUN mkdir -p /var/log/apache2 \
&& adduser apache2 \
&& touch /var/log/apache2/error_log \
&& chmod 711 /var/log/apache2/ \
&& chmod 644 /var/log/apache2/error_log \
&& echo "example,result=ok value=1i" >> /var/log/apache2/error_log \
&& echo "example,result=ok value=2i" >> /var/log/apache2/error_log \
&& echo "example,result=ok value=3i" >> /var/log/apache2/error_log \
&& echo "example,result=error value=3i" >> /var/log/apache2/error_log
RUN adduser telegraf
USER telegraf
COPY files/telegraf.conf /etc/telegraf/telegraf.conf
COPY files/telegraf /usr/bin/telegraf
ENTRYPOINT ["/usr/bin/telegraf", "-config", "/etc/telegraf/telegraf.conf", "-config-directory", "/etc/telegraf/telegraf.d", "--debug", "--test-wait", "10"]
New expected results:
2021-08-20T14:27:48Z I! Starting Telegraf
2021-08-20T14:27:48Z W! Telegraf is not permitted to read /etc/telegraf/telegraf.d
2021-08-20T14:27:48Z D! [agent] Initializing plugins
2021-08-20T14:27:48Z D! [agent] Starting service inputs
2021-08-20T14:27:48Z E! [inputs.tail] Failed to match for filepath "/var/log/apache2/error_log": open /var/log/apache2: permission denied
2021-08-20T14:27:48Z E! [inputs.tail] Failed to match for filepath "/var/log/apache2/error_log": open /var/log/apache2: permission denied
2021-08-20T14:27:58Z D! [agent] Stopping service inputs
2021-08-20T14:27:58Z D! [agent] Input channel closed
2021-08-20T14:27:58Z D! [agent] Stopped Successfully
2021-08-20T14:27:58Z E! [telegraf] Error running agent: input plugins recorded 2 errors
I've managed to try that on my end and the error reporting works as expected, thank you very much for your work. I was able to see how telegraf reported the errors exactly as you've shown in your example.
Nevertheless, I'd like to bring attention to the fact that the telegraf user is able to read that file:
bash-4.2$ whoami
telegraf
bash-4.2$ namei -om /var/log/apache2/error_log
f: /var/log/apache2/error_log
drwxr-xr-x root root /
drwxr-xr-x root root var
drwxr-xr-x root root log
drwx--x--x root root apache2
-rw-r--r-- root root error_log
bash-4.2$ tail /var/log/apache2/error_log
example,result=ok value=1i
example,result=ok value=2i
example,result=ok value=3i
example,result=error value=3i
As you can see my issue raised two problems:
/var/log/apache2/error_log
because it thinks that it cannot enter in /var/log/apache2
; with your change this seems to be fixed.I suposse that's because, at some point, the underlying library thinks that a directory without a read permission is not able to be read, which is not the case, as one with only execution permissions does indeed allow to read files inside which have the proper permissions configuration.
Maybe my issue was not clear. I apologize if that was the case.
The
tail
input (and I think thatlogparser
does this too) fails silently when attempting to read a file inside a directory which has execute but not read permissions for thetelegraf
user.I'm providing a Dockerfile to reproduce the issues below.
Below you can see the file permissions schema and proof that the user can read the file:
Below you can see that telegraf fails silently when running under the
telegraf
user:It works as expected with a
root
user or with a/var/log/apache2
directory with read and execute permissions:Relevant telegraf.conf:
System info:
Docker
Steps to reproduce:
I've provided a Dockerfile that reproduces the error:
https://github.com/hluaces/telegraf-bug-9129
If you run the container (
docker run --rm local/bug-telegraf
) you'll see that no data is gathered after a 15s wait time.If you run with as root inside the container (
docker run --rm -u root local/bug-telegraf
) you'll see that data is gathered.You can move inside the container by using
docker run --rm -it --entrypoint bash local/bug-telegraf
.Expected behavior:
tail
input.Actual behavior:
Additional info: