treasure-data / omnibus-td-agent

td-agent (Fluentd) Packaging Scripts
https://docs.treasuredata.com/articles/td-agent-changelog
Apache License 2.0
82 stars 131 forks source link

dry run failed: Operation not permitted #98

Open jasonvangundy opened 7 years ago

jasonvangundy commented 7 years ago

I am running td-agent on ubuntu trusty. I install via your script for ubuntu/trusty, which just started automatically pulling in 2.3.3-0. Suddenly I am seeing this through install phase and on service restart:

2016-10-03 18:33:14 +0000 [info]: starting fluentd-0.14.6 as dry run mode 2016-10-03 18:33:14 +0000 [error]: dry run failed: Operation not permitted 2016-10-03 18:33:23 +0000 [info]: fluent/supervisor.rb:636:read_config: reading config file path="/etc/td-agent/td-agent.conf" 2016-10-03 18:33:23 +0000 [info]: fluent/supervisor.rb:450:dry_run: starting fluentd-0.14.6 as dry run mode 2016-10-03 18:33:23 +0000 [error]: fluent/supervisor.rb:456:rescue in dry_run: dry run failed: Operation not permitted 2016-10-03 18:33:28 +0000 [info]: fluent/supervisor.rb:636:read_config: reading config file path="/etc/td-agent/td-agent.conf" 2016-10-03 18:33:28 +0000 [info]: fluent/supervisor.rb:450:dry_run: starting fluentd-0.14.6 as dry run mode 2016-10-03 18:33:28 +0000 [error]: fluent/supervisor.rb:456:rescue in dry_run: dry run failed: Operation not permitted 2016-10-03 18:45:06 +0000 [error]: dry run failed: Operation not permitted 2016-10-03 18:46:03 +0000 [error]: dry run failed: Operation not permitted

These "dry run failed: Operation not permitted" happen with each attempted service restart. I don't get much output from that configtest code, but it seems related to the following command and error output:

root@xxxxx:/etc/td-agent# /usr/sbin/td-agent --verbose --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid --user td-agent --group td-agent /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.0/lib/serverengine/privilege.rb:51:in change_privilege': Operation not permitted (Errno::EPERM) from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.0/lib/serverengine/privilege.rb:51:inchange' from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.0/lib/serverengine/daemon.rb:154:in block (2 levels) in daemonize_with_double_fork' from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.0/lib/serverengine/daemon.rb:150:infork' from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.0/lib/serverengine/daemon.rb:150:in block in daemonize_with_double_fork' from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.0/lib/serverengine/daemon.rb:142:infork' from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.0/lib/serverengine/daemon.rb:142:in daemonize_with_double_fork' from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.0/lib/serverengine/daemon.rb:107:inmain' from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/serverengine-2.0.0/lib/serverengine/daemon.rb:68:in run' from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.6/lib/fluent/supervisor.rb:524:insupervise' from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.6/lib/fluent/supervisor.rb:402:in run_supervisor' from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.6/lib/fluent/command/fluentd.rb:269:in<top (required)>' from /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:54:in require' from /opt/td-agent/embedded/lib/ruby/site_ruby/2.1.0/rubygems/core_ext/kernel_require.rb:54:inrequire' from /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.14.6/bin/fluentd:5:in <top (required)>' from /opt/td-agent/embedded/bin/fluentd:23:inload' from /opt/td-agent/embedded/bin/fluentd:23:in <top (required)>' from /usr/sbin/td-agent:7:inload' from /usr/sbin/td-agent:7:in `

'

This is on a freshly provisioned ec2 host running their ubuntu 14.04. The exact same automated provisioning process works fine with 2.3.2-0. Was there a change introduced with 2.3.3-0 that would cause this?

Also is 2.3.2-0 no longer available? I was hoping to revert.. I can see the old version out there at: (key = 2/ubuntu/trusty/pool/contrib/t/td-agent/td-agent_2.3.2-0_amd64.deb) but can't get apt to see / install it. I could manually pull it down and install, but was hoping to avoid that.

repeatedly commented 7 years ago

Thanks for the report. That's weird. We updated only bundled libraries, so this error should not happen. I will check it later.

BTW, I assume you call gem update or gem install in your deploy process. But fluentd v0.14 is not stable, so I don't recommend it.

Also is 2.3.2-0 no longer available?

This is reprepro limitation. I have a plan to replace it with aptly in the future... https://github.com/treasure-data/omnibus-td-agent/issues/66

jasonvangundy commented 7 years ago

Sorry for the misleading report. After a deeper dive it turns out that a separate plugin was causing our issue. This commit for the fluentd concat plugin brought in fluent 0.14.6 (https://github.com/fluent-plugins-nursery/fluent-plugin-concat/commit/21adef2434843257945bef3fe89ce6dcf0a16d53) and caused td-agent to fail. Pinned back to 0.6.2 of that concat plugin and all is well. Thanks for your quick response!

repeatedly commented 7 years ago

I see. I posted this point before: https://groups.google.com/forum/#!topic/fluentd/D3FHZkFHiaA

BTW, I need to investigate why td-agent 2.3.3 doesn't work with newer installed v0.14.6. So keep to open this issue.