livingsocial / rearview

Timeseries data monitoring framework
Other
281 stars 31 forks source link

Completed job 405 (FailedStatus) #2

Closed stupidbodo closed 11 years ago

stupidbodo commented 11 years ago

Hi Guys,

Great project I was hoping this could go mainstream because it fits exactly what we need in the graphite's ecosystem!

I set the project up and everything was going fine until the job is being scheduled. I was repeatedly getting the error logs below.

[info] application - Running job 405 [info] application - Completed job 405 (FailedStatus) [info] application - ScheduleJob 405 [info] application - Scheduled 405 Monitoring 50 * * * * ? [info] application - ScheduleJob 405 [info] application - Scheduled 405 Monitoring 50 * * * * ? [info] application - Running job 405 [info] application - Completed job 405 (FailedStatus) [info] application - ScheduleJob 405 [info] application - Scheduled 405 Monitoring 50 * * * * ? [info] application - ScheduleJob 405 [info] application - Scheduled 405 Monitoring 50 * * * * ? [info] application - Running job 405 [info] application - Completed job 405 (FailedStatus) [info] application - ScheduleJob 405

Any idea what could it be? Test monitor button on the UI worked for me and I tried to search for useful logs regarding this but couldn't find any. Appreciate any tips on this!

steveakers commented 11 years ago

Are you using ./sbt start, or ./sbt run? I believe ./sbt run will give you more debugging information. Also, would you mind sharing your expression code and monitor settings (including timeseries data returned if possible).

stupidbodo commented 11 years ago

Hi Steve,

Thanks for the reply. I tried using ./sbt run but it doesn't provide sufficient info for me to fix the issue. Here are the info collected.

Data returned from graphite

cpu,1378470000,1378470300,60|0.0216634904346,0.0149983998452,0.0133298731411,0.0149984422793,0.013333688298

expression

cpuusage = @a.values.mean if cpuusage > 0.01 raise "CPU Usage is high" end

metric

alias(carbon.agents.graphite-a.cpuUsage, "cpu")

Extracting data 5 minutes back. Runs schedule every 2 minute.

Logs

[debug] application - Job 405 FailedStatus [info] application - Completed job 405 (FailedStatus) [debug] application - Job finished 405 [debug] application - Rescheduling 405 [info] application - ScheduleJob 405 [debug] application - Cancelling job 405 [info] application - Scheduled 405 Monitoring * 2 * * * ?

Installed Java, Ruby, Statsd and Mysql, running on Centos6.4 X64, 2GB Ram, 2 CPU. Tests Passed.

steveakers commented 11 years ago

What happens in the GUI? Does the monitor turn red? Have you entered an email address in the notification input on view schedule, and if so are you getting any emails?

stupidbodo commented 11 years ago

When I tested the monitor via the GUI, "CPU Usage is high" is display since the condition cpuusage > 0.01 happens all the time. The monitor didn't turn red. I tried adding email or/and pagarduty key but didn't get any emails.

steveakers commented 11 years ago

I've recreated your monitor and I see this in the logs:

[debug] application - Job finished 411 [debug] application - Rescheduling 411 [info] application - ScheduleJob 411 [debug] application - Cancelling job 411 [info] application - Scheduled 411 test 0 /2 * * \ ?

The difference I see is in the schedule piece. 0 /2 * * * would be on the zeroth second every 2 minutes. From your logs it looks like you're running every second, but only when the minute is 2. I've tested it and that definitely blows up. Change your seconds to 0 and your minutes to /2 if you want it to run every 2 minutes

Also, we're hoping to release the Ruby version next week. In that version we plan to remove the seconds feature as it really won't be necessary. That will permanently address this bug as well.

stupidbodo commented 11 years ago

Thanks I didn't expect that to be the problem. Yeah the seconds feature doesn't seem necessary at all. I will definitely be trying out the Ruby version, gotta spread words on this project.