requarks / wiki-v1

Legacy version (1.x) of Wiki.js
https://wiki.js.org
GNU Affero General Public License v3.0
101 stars 75 forks source link

Wiki agent dies, but docker container is still running #48

Open hemberger opened 6 years ago

hemberger commented 6 years ago

I'm getting an out-of-memory error in my logs from time to time:

wiki_host  | 2017-11-19T14:55:52.153Z - info: [AGENT] Performing pull from remote Git repository...
wiki_host  | internal/child_process.js:325
wiki_host  |     throw errnoException(err, 'spawn');
wiki_host  |     ^
wiki_host  | 
wiki_host  | Error: spawn ENOMEM
wiki_host  |     at _errnoException (util.js:1021:11)
wiki_host  |     at ChildProcess.spawn (internal/child_process.js:325:11)
wiki_host  |     at Object.exports.spawn (child_process.js:494:9)
wiki_host  |     at spawn (/var/wiki/node_modules/git-wrapper2-promise/node_modules/child-process-promise/lib/index.js:81:23)
wiki_host  |     at module.exports.Git.spawn (/var/wiki/node_modules/git-wrapper2-promise/git.js:54:10)
wiki_host  |     at module.exports.exports.pull (/var/wiki/node_modules/git-wrapper2-promise/commands.js:48:15)
wiki_host  |     at Object.resync (/var/wiki/server/libs/git.js:167:22)
wiki_host  |     at CronJob.onTick (/var/wiki/server/agent.js:108:28)
wiki_host  |     at CronJob.fireOnTick (/var/wiki/node_modules/cron/lib/cron.js:416:22)
wiki_host  |     at Timeout.callbackWrapper (/var/wiki/node_modules/cron/lib/cron.js:481:9)
wiki_host  |     at ontimeout (timers.js:471:11)
wiki_host  |     at tryOnTimeout (timers.js:306:5)
wiki_host  |     at Timer.listOnTimeout (timers.js:266:5)

Certainly figuring out what is causing the OOM would be nice, but I'm actually more interested in finding a way for it to restart itself. If the entire node server died, then my docker container would exit and a restart: always policy in my docker-compose.yml would ensure that everything is always up and running. Instead, I feel like my only option is to forcefully restart the container at regular intervals just in case it's down.

Or is there some feature of Wiki.js that I'm missing that would allow the agent to be restarted?

My docker-compose.yml wiki service is pretty bare-bones:

    wiki:
        image: requarks/wiki
        container_name: wiki_host
        environment:
            SESSION_SECRET: ${SESSION_SECRET}
            WIKI_ADMIN_EMAIL: ${WIKI_ADMIN_EMAIL}
            HOST: ${HOST}
            FACEBOOK_APP_ID: ${FACEBOOK_APP_ID}
            FACEBOOK_APP_SECRET: ${FACEBOOK_APP_SECRET}
            GITHUB_CLIENT_ID: ${GITHUB_CLIENT_ID}
            GITHUB_CLIENT_SECRET: ${GITHUB_CLIENT_SECRET}
        volumes:
            - ./config.yml:/var/wiki/config.yml:ro
            - ./github.pem:/github.pem
        links:
            - wiki-db

Thanks!

NGPixel commented 6 years ago

There are 2 issues preventing what you are trying to achieve.

Is your container limited in RAM? Or the host is? It's the first time I see this error. I would rather investigate why the OOM error occurs in the first place instead of thinking of a "restart container" strategy.

hemberger commented 6 years ago

I suspect the host is RAM limited. It's an AWS ec2 micro instance (1GB RAM), shared with a couple rarely used static sites served with nginx. Two node server processes dominate the memory usage, but it doesn't seem excessive:

 PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 9716 root      20   0  867584 135980   8912 S  0.0 13.4   0:43.01 node server
 9765 root      20   0  757316 115320   8096 S  0.0 11.4   0:23.82 /usr/local/bin/node /var/wiki/server/agent.js 

I don't think I've done anything to limit the container's RAM.

I've seen this error a handful of times now, but I haven't been able to correlate it with any particular event. If you'd like to investigate, I'm happy to help -- just let me know how!

pms1969 commented 5 years ago

I have this problem as well, but it's because I have wikijs running in kubernetes, and the mongo instance gets rescheduled, and then wikijs can't connect to it. It just hangs and doesn't exit. If it exited, it would start up fine, because mongo would be back. It does try 3 times to connect, but then just hangs. I'd love to see a way for it to just fail and exit.

pms1969 commented 5 years ago

FYI, this is the error we see:

2018-11-09T14:44:26.943Z - error: [SERVER] Failed to connect to MongoDB instance.                                                                                                              
Unhandled rejection MongoError: failed to connect to server [mongo-mongodb:27017] on first connect [MongoError: connection 0 to mongo-mongodb:27017 timed out]                                 
    at Pool.<anonymous> (/var/wiki/node_modules/mongodb-core/lib/topologies/server.js:336:35)                                                                                                  
    at emitOne (events.js:116:13)                                                                                                                                                              
    at Pool.emit (events.js:211:7)
    at Connection.<anonymous> (/var/wiki/node_modules/mongodb-core/lib/connection/pool.js:280:12)                                                                                             
    at Object.onceWrapper (events.js:317:30)
    at emitTwo (events.js:126:13)
    at Connection.emit (events.js:214:7)
    at Socket.<anonymous> (/var/wiki/node_modules/mongodb-core/lib/connection/connection.js:197:10)                                                                                           
    at Object.onceWrapper (events.js:313:30)
    at emitNone (events.js:106:13)
    at Socket.emit (events.js:208:7)
    at Socket._onTimeout (net.js:422:8)
    at ontimeout (timers.js:498:11)
    at tryOnTimeout (timers.js:323:5)
    at Timer.listOnTimeout (timers.js:290:5)
Unhandled rejection MongoError: failed to connect to server [mongo-mongodb:27017] on first connect [MongoError: connection 0 to mongo-mongodb:27017 timed out]                                
    at Pool.<anonymous> (/var/wiki/node_modules/mongodb-core/lib/topologies/server.js:336:35)
    at emitOne (events.js:116:13)
    at Pool.emit (events.js:211:7)
    at Connection.<anonymous> (/var/wiki/node_modules/mongodb-core/lib/connection/pool.js:280:12)                                                                                             
    at Object.onceWrapper (events.js:317:30)
    at emitTwo (events.js:126:13)
    at Connection.emit (events.js:214:7)
    at Socket.<anonymous> (/var/wiki/node_modules/mongodb-core/lib/connection/connection.js:197:10)                                                                                           
    at Object.onceWrapper (events.js:313:30)
    at emitNone (events.js:106:13)
    at Socket.emit (events.js:208:7)
    at Socket._onTimeout (net.js:422:8)
    at ontimeout (timers.js:498:11)
    at tryOnTimeout (timers.js:323:5)
    at Timer.listOnTimeout (timers.js:290:5)
2018-11-09T14:44:28.030Z - error: [AGENT] Failed to connect to MongoDB instance.
Unhandled rejection MongoError: failed to connect to server [mongo-mongodb:27017] on first connect [MongoError: connection 0 to mongo-mongodb:27017 timed out]                                
    at Pool.<anonymous> (/var/wiki/node_modules/mongodb-core/lib/topologies/server.js:336:35)
    at emitOne (events.js:116:13)
    at Pool.emit (events.js:211:7)
    at Connection.<anonymous> (/var/wiki/node_modules/mongodb-core/lib/connection/pool.js:280:12)                                                                                             
    at Object.onceWrapper (events.js:317:30)
    at emitTwo (events.js:126:13)
    at Connection.emit (events.js:214:7)
    at Socket.<anonymous> (/var/wiki/node_modules/mongodb-core/lib/connection/connection.js:197:10)                                                                                           
    at Object.onceWrapper (events.js:313:30)
    at emitNone (events.js:106:13)
    at Socket.emit (events.js:208:7)
    at Socket._onTimeout (net.js:422:8)
    at ontimeout (timers.js:498:11)
    at tryOnTimeout (timers.js:323:5)

at which point it is hung for good.