strongloop / strong-pm

deployer for node applications
http://strong-pm.io
Other
1k stars 71 forks source link

Strange process list on 2nd strong-pm service deployed to same server #247

Closed notbrain closed 9 years ago

notbrain commented 9 years ago

I had an existing demo.app.com purring along nicely, then worked on getting a separate instance up. I think my main issue is my original slc ctl create and slc ctl deploy command might have corrupted something. As an aside I'm wondering how to find all the files that strong-pm installs and delete them so I can start over from scratch with both services. Is there a manual removal list somewhere in the docs?

I am also running StriderCD via upstart on the same instance, which could potentially have some impact.

In any case, I'm seeing strange behavior on the 2nd service. It seems to be running fine, but the list of workers is different than the 1st service I installed. I went through 2-7 service IDs (caused by dupe services using slc ctl create before deploying) and the 2nd service is now on SID 8. How can I reset this to get it back to 2? No real need, just OCD and curious where these indexes are stored (especially after I've removed the service completely, it seems like it can't reuse a service ID once removed?).

1st service shows up as expected, 2 workers listening on 3003.

$ slc ctl status
Service ID: 1
Service Name: proteus-demo
Environment variables:
    Name                Value
    ...snipped...
Instances:
    Version  Agent version  Cluster size  Driver metadata
     4.2.0       1.6.0            2             N/A
Processes:
        ID      PID   WID  Listening Ports  Tracking objects?  CPU profiling?  Tracing?
    1.1.12586  12586   1     0.0.0.0:3003
    1.1.12589  12589   2     0.0.0.0:3003

The 2nd service on port 3002, on the other hand, shows 3 blank lines and then 4 workers, actually now 2 blank lines and 4 workers. Is this some sort of debug port collision? The only real PIDs are the last 2, but where did the first 4 process list lines come from?

Service ID: 8
Service Name: proteus-staging
Environment variables:
    Name                Value
    ...snipped...
Instances:
    Version  Agent version  Cluster size  Driver metadata
     4.2.0       1.6.0            2             N/A
Processes:
        ID      PID   WID  Listening Ports  Tracking objects?  CPU profiling?  Tracing?
     8.1.7345   7345   0
     8.1.7249   7249   0
     8.1.7348   7348   1     0.0.0.0:3002
     8.1.7351   7351   2     0.0.0.0:3002
    8.1.12766  12766   1     0.0.0.0:3002
    8.1.12769  12769   2     0.0.0.0:3002

A tally of node processes shows the last two but the first 4 are ghost processes:

ubuntu     956  0.0  0.9 714360 36908 ?        Ssl  Jun18   5:11 node /home/ubuntu/strider/bin/strider
ubuntu    1348  0.0  7.2 1232860 293420 ?      Sl   Jun18   1:31 /usr/bin/nodejs --debug-port=5859 /home/ubuntu/strider/bin/strider
strong-+  6036 15.2 24.9 1936744 1010396 ?     Ssl  Jun18 853:07 /usr/bin/nodejs /usr/lib/node_modules/strongloop/node_modules/strong-pm/bin/sl-pm.js --listen 8701 --base /var/lib/strong-pm --driver direct
strong-+ 12583  0.0  0.8 686488 33964 ?        Sl   15:27   0:00 /usr/bin/nodejs /usr/lib/node_modules/strongloop/node_modules/strong-supervisor/bin/sl-run.js --cluster=CPU
strong-+ 12586  0.1  2.6 1039136 107804 ?      Sl   15:27   0:02 /usr/bin/nodejs --debug-port=5859 /usr/lib/node_modules/strongloop/node_modules/strong-supervisor/bin/sl-run.js .
strong-+ 12589  0.1  2.6 1037924 106660 ?      Sl   15:27   0:02 /usr/bin/nodejs --debug-port=5860 /usr/lib/node_modules/strongloop/node_modules/strong-supervisor/bin/sl-run.js .
strong-+ 12763  0.4  0.8 686500 32444 ?        Sl   15:51   0:00 /usr/bin/nodejs /usr/lib/node_modules/strongloop/node_modules/strong-supervisor/bin/sl-run.js --cluster=CPU
strong-+ 12766  2.1  2.7 1057496 110344 ?      Sl   15:51   0:02 /usr/bin/nodejs --debug-port=5859 /usr/lib/node_modules/strongloop/node_modules/strong-supervisor/bin/sl-run.js .
strong-+ 12769  2.1  2.7 1057488 110432 ?      Sl   15:51   0:02 /usr/bin/nodejs --debug-port=5860 /usr/lib/node_modules/strongloop/node_modules/strong-supervisor/bin/sl-run.js .

And the strong-pm.log for a slc ctl restart proteus-staging (2nd service):

Stop (hard) current Runner: child 12652 commit 8/deploy/default/staging-deploy
Stop Runner: child 12652 commit 8/deploy/default/staging-deploy
2015-06-22T22:51:43.337Z pid:12652 worker:0 WARN received SIGTERM, shutting down
2015-06-22T22:51:43.338Z pid:12652 worker:0 INFO supervisor size set to 0
2015-06-22T22:51:43.352Z pid:12652 worker:0 INFO supervisor stopped worker 2 (pid 12658)
2015-06-22T22:51:43.353Z pid:12652 worker:0 ERROR supervisor worker id 2 (pid 12658) expected exit with 2
2015-06-22T22:51:43.364Z pid:12652 worker:0 INFO supervisor stopped worker 1 (pid 12655)
2015-06-22T22:51:43.364Z pid:12652 worker:0 INFO supervisor resized to 0
2015-06-22T22:51:43.365Z pid:12652 worker:0 ERROR supervisor worker id 1 (pid 12655) expected exit with 2
2015-06-22T22:51:43.365Z pid:12652 worker:0 INFO supervisor size set to undefined
2015-06-22T22:51:43.365Z pid:12652 worker:0 INFO supervisor stopped
Start Runner: (stopped) commit 8/deploy/default/staging-deploy
2015-06-22T22:51:43.791Z pid:12763 worker:0 INFO strong-agent v1.6.0 profiling app 'proteus' pid '12763'
2015-06-22T22:51:43.799Z pid:12763 worker:0 INFO strong-agent[12763] started profiling agent
2015-06-22T22:51:43.801Z pid:12763 worker:0 INFO supervisor starting (pid 12763)
2015-06-22T22:51:43.803Z pid:12763 worker:0 INFO strong-agent strong-agent using strong-cluster-control v2.1.1
2015-06-22T22:51:43.806Z pid:12763 worker:0 INFO supervisor reporting metrics to `internal:`
2015-06-22T22:51:43.813Z pid:12763 worker:0 INFO strong-agent not profiling, agent metrics requires a valid license.
2015-06-22T22:51:43.813Z pid:12763 worker:0 Please contact sales@strongloop.com for assistance.
2015-06-22T22:51:43.815Z pid:12763 worker:0 INFO supervisor size set to 2
Request (status) of current Runner: child 12763 commit 8/deploy/default/staging-deploy
Request {"cmd":"status"} of Runner: child 12763 commit 8/deploy/default/staging-deploy
2015-06-22T22:51:43.823Z pid:12763 worker:0 INFO supervisor listening on 'runctl'
2015-06-22T22:51:43.892Z pid:12763 worker:0 INFO supervisor started worker 1 (pid 12766)
2015-06-22T22:51:44.056Z pid:12763 worker:0 INFO supervisor started worker 2 (pid 12769)
2015-06-22T22:51:44.057Z pid:12763 worker:0 INFO supervisor resized to 2
2015-06-22T22:51:44.269Z pid:12766 worker:1 INFO strong-agent v1.6.0 profiling app 'proteus' pid '12766'
2015-06-22T22:51:44.277Z pid:12766 worker:1 INFO strong-agent[12766] started profiling agent
2015-06-22T22:51:44.539Z pid:12769 worker:2 INFO strong-agent v1.6.0 profiling app 'proteus' pid '12769'
2015-06-22T22:51:44.547Z pid:12769 worker:2 INFO strong-agent[12769] started profiling agent
2015-06-22T22:51:45.728Z pid:12766 worker:1 ERROR strong-agent error: failed to instrument mysql
2015-06-22T22:51:45.774Z pid:12769 worker:2 ERROR strong-agent error: failed to instrument mysql
2015-06-22T22:51:46.295Z pid:12769 worker:2 INFO strong-agent not profiling, agent metrics requires a valid license.
2015-06-22T22:51:46.296Z pid:12769 worker:2 Please contact sales@strongloop.com for assistance.
2015-06-22T22:51:46.303Z pid:12769 worker:2 Web server listening at: http://0.0.0.0:3002/
2015-06-22T22:51:46.367Z pid:12766 worker:1 INFO strong-agent not profiling, agent metrics requires a valid license.
2015-06-22T22:51:46.367Z pid:12766 worker:1 Please contact sales@strongloop.com for assistance.
2015-06-22T22:51:46.370Z pid:12766 worker:1 Web server listening at: http://0.0.0.0:3002/
rmg commented 9 years ago

Looking at the logs, something is definitely going awry. It seems that PM has lost track of pid 12763, which should be the master process (WID 0) for the 2 workers that actually exist.

There have been a couple patch (and a minor) releases since strong-pm@4.2.0. Are you in a position where you can upgrade to the latest?

As for cleaning up files.. if you want, you really want to, you can manually reset everything with sudo rm -rf /var/lib/strong-pm/* (notice the /* at the end, you don't want to delete /var/lib/strong-pm itself). It's probably best to run that after shutting down PM. When you restart it, it will be a completely fresh slate.

If you are feeling particularly brave, you could "void the warranty" so to speak and try modifying the strong-pm.json file manually (again, probably best to do this after shutting down PM).

notbrain commented 9 years ago

Thanks, will upgrade and wipe everything and start over. Will document slc ctl create <svc> and see if that works as expected and avoids dupe entries with the same svc name.

Whoa...looking at strong-pm.json it has grown to 50mb overnight, and is constantly growing with what looks like constant log-dump command logs, then every once in a while all the errors associated with the fact that we haven't purchased a license for the extra SL stuff. Is this another logfile type thing or also indicative of something awry?

"4360": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:16.233Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":4360}",
"4361": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:17.265Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":4361}",
"4362": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:18.295Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":4362}",
"4363": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:19.343Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":4363}",
"4364": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:20.376Z\",\"result\":{\"log\":\"2015-06-05T00:08:19.626Z pid:7990 worker:0 WARN received SIGHUP, restarting workers\\n2015-06-05T00:08:19.641Z pid:7990 worker:0 ERROR supervisor worker id 1 (pid 8004) expected exit with 2\\n2015-06-05T00:08:19.724Z pid:7990 worker:0 INFO supervisor started worker 3 (pid 8664)\\n2015-06-05T00:08:19.725Z pid:7990 worker:0 INFO supervisor resized to 2\\n2015-06-05T00:08:19.975Z pid:8664 worker:3 INFO strong-agent v1.6.0 profiling app 'proteus' pid '8664'\\n2015-06-05T00:08:19.979Z pid:8664 worker:3 INFO strong-agent[8664] started profiling agent\\n\"},\"serviceInstanceId\":1,\"id\":4364}",
"4365": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:21.411Z\",\"result\":{\"log\":\"2015-06-05T00:08:20.838Z pid:8664 worker:3 ERROR strong-agent error: failed to instrument mysql\\n2015-06-05T00:08:21.269Z pid:8664 worker:3 INFO strong-agent not profiling, agent metrics requires a valid license.\\n2015-06-05T00:08:21.269Z pid:8664 worker:3 Please contact sales@strongloop.com for assistance.\\n2015-06-05T00:08:21.272Z pid:8664 worker:3 Web server listening at: http://0.0.0.0:3003/\\n\"},\"serviceInstanceId\":1,\"id\":4365}",
"4366": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:22.446Z\",\"result\":{\"log\":\"2015-06-05T00:08:21.736Z pid:7990 worker:0 ERROR supervisor worker id 2 (pid 8007) expected exit with 2\\n2015-06-05T00:08:21.813Z pid:7990 worker:0 INFO supervisor started worker 4 (pid 8679)\\n2015-06-05T00:08:21.813Z pid:7990 worker:0 INFO supervisor resized to 2\\n2015-06-05T00:08:22.057Z pid:8679 worker:4 INFO strong-agent v1.6.0 profiling app 'proteus' pid '8679'\\n2015-06-05T00:08:22.061Z pid:8679 worker:4 INFO strong-agent[8679] started profiling agent\\n\"},\"serviceInstanceId\":1,\"id\":4366}",
"4367": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:23.482Z\",\"result\":{\"log\":\"2015-06-05T00:08:22.907Z pid:8679 worker:4 ERROR strong-agent error: failed to instrument mysql\\n2015-06-05T00:08:23.336Z pid:8679 worker:4 INFO strong-agent not profiling, agent metrics requires a valid license.\\n2015-06-05T00:08:23.337Z pid:8679 worker:4 Please contact sales@strongloop.com for assistance.\\n2015-06-05T00:08:23.340Z pid:8679 worker:4 Web server listening at: http://0.0.0.0:3003/\\n\"},\"serviceInstanceId\":1,\"id\":4367}",
"4368": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:24.525Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":4368}",
"4369": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:25.551Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":4369}",
"4370": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:26.577Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":4370}",
"4371": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-05T00:08:27.603Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":4371}",

Was able to hard stop everything and stop strong-pm, but tailed strong-pm.json: at the end it shows it was trying to work with Service IDs 2 and 6, when slc ctl status only shows 1 and 8:

...snip...
      "260528": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-12T17:38:50.910Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":260528}",
      "260529": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-12T17:38:54.520Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":260529}",
      "260530": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-12T17:38:59.294Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":260530}",
      "260531": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-12T17:39:02.901Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":260531}",
      "260532": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-12T17:39:06.342Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":260532}",
      "260533": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-12T17:39:09.580Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":260533}",
      "260534": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-12T17:39:14.505Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":1,\"id\":260534}",
      "260535": "{\"request\":{\"cmd\":\"stop\",\"serviceId\":2,\"instanceId\":2},\"timestamp\":\"2015-06-18T21:55:12.207Z\",\"result\":{},\"serviceInstanceId\":2,\"id\":260535}",
      "260536": "{\"request\":{\"cmd\":\"stop\",\"serviceId\":6,\"instanceId\":6},\"timestamp\":\"2015-06-19T01:23:25.563Z\",\"result\":{},\"serviceInstanceId\":6,\"id\":260536}",
      "260537": "{\"request\":{\"cmd\":\"stop\",\"serviceId\":6,\"instanceId\":6},\"timestamp\":\"2015-06-19T01:25:31.521Z\",\"result\":{},\"serviceInstanceId\":6,\"id\":260537}",
      "260538": "{\"request\":{\"cmd\":\"stop\",\"serviceId\":6,\"instanceId\":6},\"timestamp\":\"2015-06-19T01:27:49.133Z\",\"result\":{},\"serviceInstanceId\":6,\"id\":260538}",
      "260539": "{\"request\":{\"cmd\":\"stop\",\"serviceId\":6,\"instanceId\":6},\"timestamp\":\"2015-06-19T01:33:10.614Z\",\"result\":{},\"serviceInstanceId\":6,\"id\":260539}",
      "260540": "{\"request\":{\"cmd\":\"restart\",\"serviceId\":8,\"instanceId\":8},\"timestamp\":\"2015-06-19T01:44:27.085Z\",\"result\":{\"message\":\"re-starting...\"},\"serviceInstanceId\":8,\"id\":260540}",
      "260541": "{\"request\":{\"cmd\":\"restart\",\"serviceId\":8,\"instanceId\":8},\"timestamp\":\"2015-06-22T22:25:39.518Z\",\"result\":{\"message\":\"re-starting...\"},\"serviceInstanceId\":8,\"id\":260541}",
      "260542": "{\"request\":{\"cmd\":\"restart\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-22T22:27:08.649Z\",\"result\":{\"message\":\"re-starting...\"},\"serviceInstanceId\":1,\"id\":260542}",
      "260543": "{\"request\":{\"cmd\":\"restart\",\"serviceId\":8,\"instanceId\":8},\"timestamp\":\"2015-06-22T22:28:26.205Z\",\"result\":{\"message\":\"re-starting...\"},\"serviceInstanceId\":8,\"id\":260543}",
      "260544": "{\"request\":{\"cmd\":\"restart\",\"serviceId\":8,\"instanceId\":8},\"timestamp\":\"2015-06-22T22:51:43.332Z\",\"result\":{\"message\":\"re-starting...\"},\"serviceInstanceId\":8,\"id\":260544}",
      "260545": "{\"request\":{\"cmd\":\"stop\",\"serviceId\":8,\"instanceId\":8},\"timestamp\":\"2015-06-23T17:12:53.740Z\",\"result\":{},\"serviceInstanceId\":8,\"id\":260545}",
      "260546": "{\"request\":{\"cmd\":\"stop\",\"serviceId\":1,\"instanceId\":1},\"timestamp\":\"2015-06-23T17:13:26.745Z\",\"result\":{},\"serviceInstanceId\":1,\"id\":260546}"
    }
  }
}
rmg commented 9 years ago

@notbrain do you have slc ctl log-dump -f running somewhere?

@kraman @sam-github is the InstanceAction means to be persisted? Seems like it probably shouldn't be..

notbrain commented 9 years ago

I didn't have any slc ctl log-dump -f commands running just now; last week I was attempting to do a bunch of slc ctl commands while 2nd service's mongo conn was constantly error-ing (waiting for env-set to snap into correctness) but most of those ops resulted in 90s of waiting and then the "socket hang up" error. Restarting svc ID 8 from that point was never able to recover and run normally. The last time I had log-dump --follow running was last week. Once I saw strong-pm.log had the same I stopped using log-dump.

notbrain commented 9 years ago

Is there a recommended way to install strong-pm latest standalone alongside strongloop? Since this is a build server I need the full slc command suite to deploy/build.

The docs on upgrading strong-pm say to upgrade strong-pm by itself but then say to use slc to install everything, which ends up creating an upstart file that points to the strongloop/node_modules/.../sl-pm.js dependency that is the previous version, making the previous upgrade command pointless?

Is it as simple as editing the strong-pm.conf file to use /usr/bin/sl-pm instead of the strongloop dependency?

slc-installed sl-pm:

exec /usr/bin/nodejs /usr/lib/node_modules/strongloop/node_modules/strong-pm/bin/sl-pm.js --listen 8701 --base /var/lib/strong-pm --driver direct

desired sl-pm?

exec /usr/bin/nodejs /usr/bin/sl-pm --listen 8701 --base /var/lib/strong-pm --driver direct

3 different pm's floating around (note grep, ignore tree chars):

$ npm ls -g | grep strong-pm
├─┬ strong-pm@4.3.1
  │ ├─┬ strong-pm@3.2.0
  ├─┬ strong-pm@4.2.0
rmg commented 9 years ago

Yes, I think that change should work. If it doesn't, /usr/lib/node_modules/strong-pm/bin/sl-pm.js would definitely work.

If you are installing all of strongloop anyway on that server then there's likely no benefit from installing the standalone strong-pm. You can upgrade the strong-pm dependency with npm install -g strongloop (no need to uninstall first).

As an aside: For the most part, the slc XXX commands are just wrappers for an sl-XXX binary that is provided by of one of the dependencies. For something like slc build (strong-build) or slc deploy (strong-deploy) it can be a significant savings in disk space and bandwidth if those resources are constrained and only the one command is needed.

notbrain commented 9 years ago

So far so good after I deleted everything inside /var/lib/strong-pm/, updated to 4.3.1; I was also able to use slc ctl create and then slc deploy -s svcname without creating duplicate services.

The main concern left is the fact that strong-pm.json is continuously growing for no understood reason, even when I do not have any ...log-dump --follow running. Is this file's size managed by strong-pm? I continue to see strong-pm.json fill up with this stuff:

"1016": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":2,\"instanceId\":2},\"timestamp\":\"2015-06-24T01:00:56.045Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":2,\"id\":1016}",
"1017": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":2,\"instanceId\":2},\"timestamp\":\"2015-06-24T01:00:57.060Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":2,\"id\":1017}",
"1018": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":2,\"instanceId\":2},\"timestamp\":\"2015-06-24T01:00:58.075Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":2,\"id\":1018}",
"1019": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":2,\"instanceId\":2},\"timestamp\":\"2015-06-24T01:00:59.093Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":2,\"id\":1019}",
"1020": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":2,\"instanceId\":2},\"timestamp\":\"2015-06-24T01:01:00.108Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":2,\"id\":1020}",
"1021": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":2,\"instanceId\":2},\"timestamp\":\"2015-06-24T01:01:01.123Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":2,\"id\":1021}",
"1022": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":2,\"instanceId\":2},\"timestamp\":\"2015-06-24T01:01:02.139Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":2,\"id\":1022}",
"1023": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":2,\"instanceId\":2},\"timestamp\":\"2015-06-24T01:01:03.157Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":2,\"id\":1023}",
"1024": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":2,\"instanceId\":2},\"timestamp\":\"2015-06-24T01:01:04.171Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":2,\"id\":1024}",
"1025": "{\"request\":{\"cmd\":\"log-dump\",\"serviceId\":2,\"instanceId\":2},\"timestamp\":\"2015-06-24T01:01:05.189Z\",\"result\":{\"log\":\"\"},\"serviceInstanceId\":2,\"id\":1025}"
notbrain commented 9 years ago

Encountered another strange process list after launching an app failed because of an unset env variable. Figured I'd post here to find out what the recommended way would be to recover from such things. Makes me nervous that processes would just keep spawning so I am still not using strong-pm in production yet.

I have hard stopped the app but still see this for status - what recourse do I have beyond completely removing and re-adding the app to recover and clear all these out? Also if I do not clear out /var/lib/strong-pm/ will the new app use a higher service ID and service ID 2 will forever be around and unusable? Can I just delete /var/lib/strong-pm/svc/2? instead?

Service ID: 2
Service Name: proteus-staging
Environment variables:
    Name                Value
    ...snipped...
Instances:
    Version  Agent version  Cluster size  Driver metadata
     4.3.1       1.6.1            0             N/A
Processes:
        ID      PID    WID  Listening Ports  Tracking objects?  CPU profiling?  Tracing?
    2.1.11179  11179    0
    2.1.11183  11183    1
    2.1.11187  11187    2
    2.1.11196  11196    3
    ...
    2.1.23276  23276  1942
    2.1.23282  23282  1943
    2.1.23288  23288  1944
notbrain commented 9 years ago

If there's anything I would suggest to make this a bit less painful - allow a file of env vars to be used instead of requiring they be set via slc ctl env-set. Is there a way I can do this? Where do the env vars get stored by strong-pm?

notbrain commented 9 years ago

slc ctl remove servicename works to remove the service, but upon a subsequent slc ctl create servicename, the old service ID is no longer usable? Is this by design or just how things ended up? Short of completely deleting /var/lib/strong-pm/* is there a way to reuse a service ID once you've removed the app that was assigned to it?

sam-github commented 9 years ago

strong-pm.json growing

It is growing a line per process that existed, we don't delete old processes ATM, is your disk really short of space?

dead pids showing up in sl-meshctl status

This is a bug, I can't reproduce, can you reproduce all the time, or just sometimes?

edit: if you see again, can you gist the strong-pm.json?

env-set painful

Have you considered doing slc ctl -C http://production env-set $(cat my-env.txt)?

This would be less painful than sshing into that remote host, and hacking up a file somewhere to be my-env.txt, wouldn't it? If not, can you elaborate more on the painfulness?

The internal DB structures of pm aren't about to be exposed, its designed to be remotely configurable.

reusability of record IDs

We could perhaps make it possible to deploy to a service ID, and force a new DB record to be created with a previously used record ID.

Can you consider the IDs to be opaque identifiers? Why do you want to resuse one? Is it because we (ab)use them in order to derive a unique listening port?

In an upcoming release, the service's port will be overrideable from the default.

sam-github commented 9 years ago

/to @kraman we may need to accelerate the gc of no longer useful records from the DB. That will involve making sure there are no outstanding references to them, though.

rmg commented 9 years ago

forgot to press the green button

The service id is incremented every time a service is created and that sequence number is tracked inside the "database" (currently a $BASEDIR/strong-pm.json which backs a loopback memory datastore).

PM itself doesn't support any sort of environment file, but strong-supervisor supports a .env file (see dotenv) in the app's root. It would require you to either commit your .env file or at least bundle it with your deployment, which may or may not be acceptable to you.

Another option is setting multiple variables at once. If you have a .env file like above (without comments), you could use it with slc ctl env-set servicename $(cat .env). Still not quite what you're asking for, but may be close enough depending on your environment.

notbrain commented 9 years ago

@sam-github

Have you considered doing slc ctl -C http://production env-set $(cat my-env.txt)?

Thanks, will try this, was just working on something similar -- a script that would take svcname and set accordingly. @rmg yes it is not acceptable for us to commit passwords etc, so the script route is fine, $(cat env) is also something to consider.

Can you consider the IDs to be opaque identifiers? Why do you want to reuse one?

Yes, and because OCD and simply wondering where they are kept; so these are the IDs of the entries in the strong-pm private DB? That's fine by me, and being able to set them was an old issue for us since now setting npm_config_port overrides the 3000+svcID thing. I just needed ports for a manual nginx setup.

Appreciate the help.

sam-github commented 9 years ago

@notbrain are you good now? can we close this?

notbrain commented 9 years ago

So far so good, will open a new one if anything shows up again.

sam-github commented 9 years ago

Glad to hear that.