psadmin-io / psadmin-plus

A psadmin helper script
MIT License
17 stars 4 forks source link

Setup/Leverage Exit Code so you can make it "smart" #90

Open NateWerner opened 6 years ago

NateWerner commented 6 years ago

Once you have the commands capturing the exit code (and timeout) #89 , you can leverage to take special actions to try to recover from it. This ensures the users of the psa command will be confident of the end result state of the request, and act on it, and ensure their following post psa stuff will run.

Samples and scenarios we've added to our scripts:

  1. "psadmin stop" is a gracefull stop for Tuxedo, anything hung will wait forever. Capture a timeout event on a "stop" and retry with a "psadmin kill". This will gurantee a stop/kill will alway eventually complete. We even will finally run kill on every process tied to the Tux domain, and ipcrm. These "hard" kills also require a "psadmin configure" so the next restart will work. This happens when the DB drops from under the app servers/scheduler (aka, someone forgot to stop domain befoer DB upgrade)
  2. Exit code 255. Happens when 2 domains on one config path make a tools change or OEM is involved. If stop/start has an exit code of 255, run psadmin configure, then repeat the requested command. "auto-recovered"
  3. Exit code 137, psadmin start timed out. Rare but happens to us when domain boot right after an automated DB clone. They can boot really slow when DB start is still finishing internal task. If it happens, we do a sleep for 20 secs, then try again, to recover from error.
  4. Exit code 40, Domain is already down. Can happen if 2 people stop the same domain at similar times. This is a "happy" exit code, return 0 to the user of psa.
  5. ExitCode 5, domain already up, Can happen if 2 people start the same domain at similar times. This is a "happy" exit code, return 0 to the user of psa.

Feed all exit codes down the chain to the psa command, so user of it can read the exit code and determine what step to take, like notify admin email address that domain X did not start correctly. Useful with the psa command if it is automated from other scripts or maybe rundeck job.

Something like this:

            exitCode = do_cmd(start_app_cmd)
            if "#{exitCode}" == "255"
              # config state error
              do_configure(type, domain)
              # try again
              exitCode = do_cmd(start_app_cmd)
            end
iversond commented 6 years ago

This is a great idea (along with #89)! This would make psa much more reliable.