outbrain / orchestrator-agent

MySQL replication topology manager - agent (daemon)
Apache License 2.0
35 stars 53 forks source link

Error on lvs-snapshots #19

Open hagay3 opened 8 years ago

hagay3 commented 8 years ago

Hi, I`m facing some issue with orchestrator agent with fetching lvs-snapshots. As orchestrator is failed to show the node snapshots, I can see inside the orchestrator agent log file

[martini] Started GET /api/lvs-snapshots for 10.**:52318 2016-06-17 08:05:23 ERROR exit status 127*

Also tried it by myself with orchestrator agent api and it turns out to be unreachable: request: host:3002/api/lvs-snapshots?token=......

{ Code: "ERROR", Message: "exit status 127", Details: null }

Restarting orchestrator-agent bring this request to life, but it seems that after a while it become unavailable again.

hagay3 commented 8 years ago

@shlomi-noach ?

shlomi-noach commented 8 years ago

Can you --debug --trace and copy output?

hagay3 commented 8 years ago

Maybe do you mean -stack ?

Usage of ./orchestrator-agent: -config="": config file name -debug=false: debug mode (very verbose) -stack=false: add stack trace upon error -verbose=false: verbose

shlomi-noach commented 8 years ago

Yes you are right --stack

hagay3 commented 8 years ago

seems like the error not returns after the trace enabled. I will check it all over the cluster and let you know.

shlomi-noach commented 8 years ago

--trace should not change behavior. It merely prints stack trace given an error occurred. It does not compile differently or anything.

hagay3 commented 8 years ago

Yes, this is why i`m testing it all over the cluster waiting for error to re-appear.

hagay3 commented 8 years ago

Hi, here is the error log

2
[martini] Started GET /api/lvs-snapshots for ****:57694
2016-06-23 10:59:20 DEBUG execCmd: lvs --noheading -o lv_name,vg_name,lv_path,snap_percent
2016-06-23 10:59:20 ERROR exit status 127
/root/go/src/github.com/outbrain/golib/log/log.go:178 (0x43d071)
/root/go/src/github.com/outbrain/golib/log/log.go:224 (0x43d4ca)
/root/orchestrator-agent/src/github.com/outbrain/orchestrator-agent/osagent/osagent.go:106 (0x4f5fd2)
/root/orchestrator-agent/src/github.com/outbrain/orchestrator-agent/osagent/osagent.go:156 (0x4f675b)
/root/orchestrator-agent/src/github.com/outbrain/orchestrator-agent/http/api.go:104 (0x50acb0)
/root/orchestrator-agent/src/github.com/outbrain/orchestrator-agent/http/api.go:456 (0x510ace)
/usr/lib/go/src/pkg/runtime/asm_amd64.s:339 (0x424582)
/usr/lib/go/src/pkg/reflect/value.go:474 (0x52d82b)
/usr/lib/go/src/pkg/reflect/value.go:345 (0x52c91d)
/root/go/src/github.com/codegangsta/inject/inject.go:102 (0x5bf904)
/root/go/src/github.com/go-martini/martini/env.go:1 (0x5039fc)
/root/go/src/github.com/go-martini/martini/router.go:408 (0x501074)
/root/go/src/github.com/go-martini/martini/router.go:285 (0x500492)
/root/go/src/github.com/go-martini/martini/router.go:132 (0x4ff4e2)
/root/go/src/github.com/go-martini/martini/martini.go:125 (0x501aa0)
/usr/lib/go/src/pkg/runtime/asm_amd64.s:340 (0x4245e2)
/usr/lib/go/src/pkg/reflect/value.go:474 (0x52d82b)
/usr/lib/go/src/pkg/reflect/value.go:345 (0x52c91d)
/root/go/src/github.com/codegangsta/inject/inject.go:102 (0x5bf904)
/root/go/src/github.com/go-martini/martini/martini.go:179 (0x4fd982)
/root/go/src/github.com/go-martini/martini/martini.go:170 (0x4fd8db)
/root/go/src/github.com/martini-contrib/gzip/gzip.go:40 (0x507962)
/root/go/src/github.com/martini-contrib/gzip/gzip.go:56 (0x507a05)
/usr/lib/go/src/pkg/runtime/asm_amd64.s:340 (0x4245e2)
/usr/lib/go/src/pkg/reflect/value.go:474 (0x52d82b)
/usr/lib/go/src/pkg/reflect/value.go:345 (0x52c91d)
/root/go/src/github.com/codegangsta/inject/inject.go:102 (0x5bf904)
/root/go/src/github.com/go-martini/martini/martini.go:179 (0x4fd982)
/root/go/src/github.com/go-martini/martini/martini.go:170 (0x4fd8db)
/root/go/src/github.com/go-martini/martini/recovery.go:142 (0x502036)
/root/go/src/github.com/go-martini/martini/martini.go:179 (0x4fd982)
/root/go/src/github.com/go-martini/martini/martini.go:170 (0x4fd8db)
/root/go/src/github.com/go-martini/martini/recovery.go:142 (0x502036)
/usr/lib/go/src/pkg/runtime/asm_amd64.s:339 (0x424582)
/usr/lib/go/src/pkg/reflect/value.go:474 (0x52d82b)
/usr/lib/go/src/pkg/reflect/value.go:345 (0x52c91d)
/root/go/src/github.com/codegangsta/inject/inject.go:102 (0x5bf904)
/root/go/src/github.com/go-martini/martini/martini.go:179 (0x4fd982)
/root/go/src/github.com/go-martini/martini/martini.go:170 (0x4fd8db)
/root/go/src/github.com/go-martini/martini/logger.go:25 (0x501862)
/usr/lib/go/src/pkg/runtime/asm_amd64.s:340 (0x4245e2)
/usr/lib/go/src/pkg/reflect/value.go:474 (0x52d82b)
/usr/lib/go/src/pkg/reflect/value.go:345 (0x52c91d)
/root/go/src/github.com/codegangsta/inject/inject.go:102 (0x5bf904)
/root/go/src/github.com/go-martini/martini/martini.go:179 (0x4fd982)
/root/go/src/github.com/go-martini/martini/martini.go:75 (0x4fccd3)
/usr/lib/go/src/pkg/net/http/server.go:1597 (0x4e222e)
/usr/lib/go/src/pkg/net/http/server.go:1167 (0x4e01a7)
/usr/lib/go/src/pkg/runtime/proc.c:1394 (0x417a50)
[martini] Completed 500 Internal Server Error in 8.186557ms
shlomi-noach commented 8 years ago

OK, this doesn't add much insight. It's just that the lvs command returns with error code. You say restarting orchestrator-agent brings this back to life. And -- without restarting orchestrator-agent, does it keep consistently return with this error? For how long?

And, assuming it is consistently returning same error, are you able to invoke that command in command line?

hagay3 commented 8 years ago

It is consistently returns same error. Invoking the command is OK. output example

mysqldata mdata /dev/mdata/mysqldata mysqldata-snapd01-2016-06-30 mdata /dev/mdata/mysqldata-snapd01-2016-06-30 75.00 home outbrain /dev/outbrain/home opt outbrain /dev/outbrain/opt outbrain outbrain /dev/outbrain/outbrain root outbrain /dev/outbrain/root swap outbrain /dev/outbrain/swap tmp outbrain /dev/outbrain/tmp var outbrain /dev/outbrain/var

hagay3 commented 8 years ago

@shlomi-noach ?

hagay3 commented 8 years ago

Maybe this one is different error?

2016-07-03 03:00:08 DEBUG execCmd: lvs --noheading -o lv_name,vg_name,lv_path,snap_percent
2016-07-03 03:00:08 ERROR exit status 127
/usr/share/golang/src/github.com/outbrain/golib/log/log.go:111 (0x43d011)
/usr/share/golang/src/github.com/outbrain/golib/log/log.go:157 (0x43d37a)
/home/snoach/dev/outbrain/github/orchestrator-agent/src/github.com/outbrain/orchestrator-agent/osagent/osagent.go:97 (0x4f4202)
/home/snoach/dev/outbrain/github/orchestrator-agent/src/github.com/outbrain/orchestrator-agent/osagent/osagent.go:147 (0x4f4944)
/home/snoach/dev/outbrain/github/orchestrator-agent/src/github.com/outbrain/orchestrator-agent/http/api.go:103 (0x507c40)
/home/snoach/dev/outbrain/github/orchestrator-agent/src/github.com/outbrain/orchestrator-agent/http/api.go:444 (0x50d78e)
/usr/local/go/src/pkg/runtime/asm_amd64.s:339 (0x424582)
/usr/local/go/src/pkg/reflect/value.go:474 (0x52958b)
/usr/local/go/src/pkg/reflect/value.go:345 (0x52867d)
/usr/share/golang/src/github.com/codegangsta/inject/inject.go:102 (0x5ba8b4)
/usr/share/golang/src/github.com/go-martini/martini/env.go:1 (0x50105c)
/usr/share/golang/src/github.com/go-martini/martini/router.go:350 (0x4fe934)
/usr/share/golang/src/github.com/go-martini/martini/router.go:229 (0x4fdd82)
/usr/share/golang/src/github.com/go-martini/martini/router.go:112 (0x4fcdbc)
/usr/share/golang/src/github.com/go-martini/martini/martini.go:119 (0x4ff330)
/usr/local/go/src/pkg/runtime/asm_amd64.s:340 (0x4245e2)
/usr/local/go/src/pkg/reflect/value.go:474 (0x52958b)
/usr/local/go/src/pkg/reflect/value.go:345 (0x52867d)
/usr/share/golang/src/github.com/codegangsta/inject/inject.go:102 (0x5ba8b4)
/usr/share/golang/src/github.com/go-martini/martini/martini.go:173 (0x4fb3c2)
/usr/share/golang/src/github.com/go-martini/martini/martini.go:164 (0x4fb31b)
/usr/share/golang/src/github.com/martini-contrib/gzip/gzip.go:33 (0x504ae2)
/usr/local/go/src/pkg/runtime/asm_amd64.s:340 (0x4245e2)
/usr/local/go/src/pkg/reflect/value.go:474 (0x52958b)
/usr/local/go/src/pkg/reflect/value.go:345 (0x52867d)
/usr/share/golang/src/github.com/codegangsta/inject/inject.go:102 (0x5ba8b4)
/usr/share/golang/src/github.com/go-martini/martini/martini.go:173 (0x4fb3c2)
/usr/share/golang/src/github.com/go-martini/martini/martini.go:164 (0x4fb31b)
/usr/share/golang/src/github.com/go-martini/martini/recovery.go:140 (0x4ff856)
/usr/local/go/src/pkg/runtime/asm_amd64.s:339 (0x424582)
/usr/local/go/src/pkg/reflect/value.go:474 (0x52958b)
/usr/local/go/src/pkg/reflect/value.go:345 (0x52867d)
/usr/share/golang/src/github.com/codegangsta/inject/inject.go:102 (0x5ba8b4)
/usr/share/golang/src/github.com/go-martini/martini/martini.go:173 (0x4fb3c2)
/usr/share/golang/src/github.com/go-martini/martini/martini.go:164 (0x4fb31b)
/usr/share/golang/src/github.com/go-martini/martini/logger.go:25 (0x4ff0f2)
/usr/local/go/src/pkg/runtime/asm_amd64.s:340 (0x4245e2)
/usr/local/go/src/pkg/reflect/value.go:474 (0x52958b)
/usr/local/go/src/pkg/reflect/value.go:345 (0x52867d)
/usr/share/golang/src/github.com/codegangsta/inject/inject.go:102 (0x5ba8b4)
/usr/share/golang/src/github.com/go-martini/martini/martini.go:173 (0x4fb3c2)
/usr/share/golang/src/github.com/go-martini/martini/martini.go:69 (0x4fa713)
/usr/local/go/src/pkg/net/http/server.go:1597 (0x4e020e)
/usr/local/go/src/pkg/net/http/server.go:1167 (0x4de217)
/usr/local/go/src/pkg/runtime/proc.c:1394 (0x417a50)
[martini] Completed 500 Internal Server Error in 3.646957ms
[martini] Started GET /api/mount for **
shlomi-noach commented 8 years ago

One last idea; is orchestrator still running as root? (It used to). If not, change command to sudo -i <command>

hagay3 commented 8 years ago

Yes, orchestrator-agent and orchestrator (server) both run with root

shlomi-noach commented 8 years ago

Then, I'm afraid I don't know; I'm not sure what exit status 127 stands for, and this is unfortunately the only info I have here.

hagay3 commented 8 years ago

I think the exit code 127 for this one means it found no snapshots on the host. As other commands get the same exit code when the output is empy in bash. I can see these errors within exit code 127 for other commands that fail inside the agent log file. For example there is sort of check if there is mounted snapshots on the host, and it keeps get exit code 127 because there is really no mounted snapshots on host.

Maybe there is a way to add some code inside orchestrator agent to debug this one? (for example print the output of "lvs no heading .....") My only thought is to add the full path to 'lvs', as the exit code 127 means "command not found" but I dont think its the issue here, maybe worth adding that and check.

shlomi-noach commented 8 years ago

I can see two paths for this:

I suggest the former should be easy to do. Clone, modify, build via build.sh, deploy, test, PR

hagay3 commented 8 years ago

If I use your second suggestion why I need to change the code? I cant just build the package with the updated code?

This line invoke the lvs command For debugging I want to show the exact command the agent going to execute

output, err := commandOutput(sudoCmd(fmt.Sprintf("lvs --noheading -o lv_name,vg_name,lv_path,snap_percent %s", volumeName)))

What I need to add to the code for adding the full command to the log file before it invokes? I`m pretty sure the issue is that the command just returns with empty output.

OK checked it again and the log file really tells about the exit code for bash(which is great). So maybe adding full path to lvs will solve it. I think it will be wise to use "locate" before setting the path because there is differences with paths on different linux dists.

shlomi-noach commented 8 years ago

locate is not installed by default, on RedHat nor on Debian. And, I should also note, if you can't find lvs, you probably wouldn't be able to find locate.

shlomi-noach commented 8 years ago

The quickest for you would be to edit the path, hard code, build & deploy. If this works, then we'll open an issue where I will allows for a configurable path prefix.

shlomi-noach commented 8 years ago

If I use your second suggestion why I need to change the code? I cant just build the package with the updated code?

Because the URL that will serve the customized command is not the URL orchestrator would call.

hagay3 commented 8 years ago

@shlomi-noach It`s solved by using full path.

hagay3 commented 8 years ago

@shlomi-noach ?

shlomi-noach commented 8 years ago

@hagay3 thank you - I got this. Please avoid pinging me repeatedly and understand my own schedule has its own constraints. I'll be merging a fix.

shlomi-noach commented 8 years ago

A thing that bothers me is that while lvs is typically on /usr/sbin, other commands are found on /usr/bin. I'm looking into a generic solution.