strapi / strapi

🚀 Strapi is the leading open-source headless CMS. It’s 100% JavaScript/TypeScript, fully customizable and developer-first.
https://strapi.io
Other
62.7k stars 7.87k forks source link

Mongo +srv config, timeout #4311

Closed gregorskii closed 4 years ago

gregorskii commented 4 years ago

Describe the bug

MongoDB time out when using a clustered database provider such as Atlas when using +srv URLs. The system works correctly for a while then begins to hang with errors.

Steps to reproduce the behavior

Configure Strapi to use a +srv url with this configuration:

{
  "defaultConnection": "default",
  "connections": {
    "default": {
      "connector": "strapi-hook-mongoose",
      "settings": {
        "uri": "mongodb+srv://${process.env.DATABASE_USERNAME}:${process.env.DATABASE_PASSWORD}@${process.env.DATABASE_HOST}/${process.env.DATABASE_NAME}?retryWrites=true&w=majority",
        "port": "${process.env.DATABASE_PORT || 27017}"
      },
      "options": {
        "ssl": "${process.env.DATABASE_SSL || false}",
        "authenticationDatabase": "${process.env.DATABASE_AUTHENTICATION_DATABASE || ''}"
      }
    }
  }
}

Expected behavior

The database to connect and remain connected over long periods of time.

Errors

Seems like there is a memory leak in the server selection topology of the mongodb lib:

{
 insertId: "ID"  
 labels: {…}  
 logName: "stderr"  
 receiveTimestamp: "2019-10-21T18:16:44.845923127Z"  
 resource: {…}  
 textPayload: "(node:18) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 topologyDescriptionChanged listeners added. Use emitter.setMaxListeners() to increase limit
"  
 timestamp: "2019-10-21T18:16:39Z"  
}

Which causes an issue with making queries (of course):

textPayload: "[2019-10-21T18:19:11.878Z] error MongoTimeoutError: Server selection timed out after 30000 ms
    at Timeout.setTimeout [as _onTimeout] (/app/node_modules/mongodb/lib/core/sdam/topology.js:850:16)
    at ontimeout (timers.js:436:11)
    at tryOnTimeout (timers.js:300:5)
    at listOnTimeout (timers.js:263:5)
    at Timer.processTimers (timers.js:223:10) 

System

Additional context

Since there is an abstraction between the actual database config and Strapi's config I don't quite know how to go about debugging this. The only entrypoint I have is the config json due to bookshelf.

Some links I found Googling:

https://jira.mongodb.org/browse/NODE-2123

This could potentially be solved by setting additional passthrough values to the settings of the mongodb config using options.

I wanted to know anyones thoughts on this that may have more insight into the under the hoods before I spend a bunch of time debugging. Would greatly appreciate it.

Thanks!

gregorskii commented 4 years ago

So this is def a mongo driver issue, but I see in the mongoose hook code:

https://github.com/strapi/strapi/blob/master/packages/strapi-hook-mongoose/lib/index.js#L89

connectOptions.useUnifiedTopology = useUnifiedTopology || 'false';

'false' as a string is truthy, I believe this will enable unified topology by default.

I have deployed a version where my database config includes a direct set for useUnifiedTopology to false in the json form. Watching my logs to see if it fixes it.

{
  "defaultConnection": "default",
  "connections": {
    "default": {
      "connector": "strapi-hook-mongoose",
      "settings": {
        "uri": "mongodb+srv://${process.env.DATABASE_USERNAME}:${process.env.DATABASE_PASSWORD}@${process.env.DATABASE_HOST}/${process.env.DATABASE_NAME}?retryWrites=true&w=majority",
        "port": "${process.env.DATABASE_PORT || 27017}"
      },
      "options": {
        "ssl": "${process.env.DATABASE_SSL || false}",
        "useUnifiedTopology": "${process.env.USE_UNIFIED_TOPOLOGY || false}",
        "authenticationDatabase": "${process.env.DATABASE_AUTHENTICATION_DATABASE || ''}"
      }
    }
  }
}

Can you confirm? I think it will help others avoid some debugging in the future.

Thanks!

lauriejim commented 4 years ago

Hello @gregorskii ! I'm not sure because the ${process.env.USE_UNIFIED_TOPOLOGY || false} is in "..." so it will be a string at the end. I just followed the documentation here and all works fine. Have you a particular use case that causes this issue?

gregorskii commented 4 years ago

As far as I can tell strapi is converting the values in the json to actual Booleans, is that not true? If that’s the case is json really the best method for strapi config? Should it not be yaml?

After updating to this config I have not seen the error again. And yes I did have the specific error reported in the Mongo issue tracker above. It is hard to debug because all I get is the logs from my application and the internal logs from mongo in my deployed logging. Since the strapi config is json, and it’s consumed by strapi I cannot test it without forking strapi.

Unified topology was enabled. The line of code in strapi linked above would result in the value being true. Strings are truthy if defined.

gregorskii commented 4 years ago

Also looking at those notes it not using the mongo +srv url it’s using mongo 2.2 config.

I tried following those instructions and I could not get atlas to connect. I could try using the older connection string more. But the issues remains with the new connection strings with +srv in them (which is what atlas recommends for the version of mongo mongoose uses).

42ae commented 4 years ago

I am facing the exact same issue, with mongoDb atlas as well. It could also be related to issue #4188

gregorskii commented 4 years ago

This happened to me again:

(node:30) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 topologyDescriptionChanged listeners added. Use emitter.setMaxListeners() to increase limit

Even with the false for the new topology. I do not see the earlier error about the new Topology being used.

Seems like you can set the max listeners to 0 in Mongo but that would potentially create a memory leak.

https://github.com/Automattic/mongoose/issues/1992

@42ae have you tried using non +srv URLs with Atlas? I could not seem to get it to connect, but I might revisit that.

42ae commented 4 years ago

@gregorskii no I did not try, also it could be related to a driver issue when the mongo's proxy goes down... See https://jira.mongodb.org/browse/NODE-2123

The srv allows to fetch the DNS seedlist from atlas and therefore I don't think it's a good idea to remove it as the hostnames could change on the DNS SRV record.

I am still trying to find out why strapi / mongoose / mongo is not able to switch to the secondary node of the replica set when the primary one disconnect. Let me know if you find anything on your end ;)

gregorskii commented 4 years ago

Ya I linked that issue above. I have been watching it. I am trying to figure out if this is something internal to mongoose or if it can be fixed via the connection options and configuration.

Regarding the +srv URLs do they change the domain of the replicas often? We have another set of applications connecting to mongo that use the standard 2.0 type URLs to connect and we don't have any connection issues. The two databases are setup similarly.

42ae commented 4 years ago

On my end, the bug occurs every 6 weeks or so, so I would guess they are not changing the replica set hostnames very often but still often enough to be annoyed by this issue.

gregorskii commented 4 years ago

Ah I see. So when they change the hostnames the primary fails and the internal mongo library will not detect the new one?

For me the MaxListenersExceededWarning is happening every week or so. So I do not know if its compounding or keeping the listeners active, or I just have the listeners configured incorrectly.

gregorskii commented 4 years ago

I am swapping this to the old 2.0 connection string. I will keep you updated to see if it does this again. It seems to be there are some issues in the mongo driver.

Even with useUnifiedTopology disabled I think there is another issue with the change descriptors per:

https://jira.mongodb.org/browse/NODE-1709

I think I might be seeing a different issue that you.

42ae commented 4 years ago

I actually think I am facing the same issue, see my logs below.

// api-error.log
npm ERR! A complete log of this run can be found in:
npm ERR!     /home/ffa/.npm/_logs/2019-09-30T09_47_10_306Z-debug.log
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! ffa-strapi@0.1.0 start: `strapi start`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the ffa-strapi@0.1.0 start script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

  AssertionError [ERR_ASSERTION]: Cannot wrap non-Error object
      at new AssertionError (internal/assert.js:269:11)
      at Object.exports.assert (/home/ffa/ffa-strapi/node_modules/hoek/lib/index.js:559:11)
      at Function.boomify (/home/ffa/ffa-strapi/node_modules/boom/lib/index.js:107:14)
      at strapi.app.use (/home/ffa/ffa-strapi/node_modules/strapi/lib/middlewares/boom/index.js:80:22)
      at process._tickCallback (internal/process/next_tick.js:68:7)

(node:1900) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 topologyDescriptionChanged listeners added. Use emitter.setMaxListeners() to increase limit
// api-out.log
[2019-10-29T19:30:53.629Z] error MongoTimeoutError: Server selection timed out after 30000 ms
    at Timeout.setTimeout [as _onTimeout] (/home/ffa/ffa-strapi/node_modules/mongodb/lib/core/sdam/topology.js:850:16)
    at ontimeout (timers.js:436:11)
    at tryOnTimeout (timers.js:300:5)
    at listOnTimeout (timers.js:263:5)
    at Timer.processTimers (timers.js:223:10)
[...]
[2019-10-29T20:32:30.225Z] error MongoTimeoutError: Server selection timed out after 30000 ms
    at Timeout.setTimeout [as _onTimeout] (/home/ffa/ffa-strapi/node_modules/mongodb/lib/core/sdam/topology.js:850:16)
    at ontimeout (timers.js:436:11)
    at tryOnTimeout (timers.js:300:5)
    at listOnTimeout (timers.js:263:5)
    at Timer.processTimers (timers.js:223:10)
[...] (infinite until manual restart)
gregorskii commented 4 years ago

I am getting this issue still. I switched to the standard Mongo URI scheme a couple weeks ago. On the clock about every week my GAE node crashes and my front end gets a 503 timeout due to Apollo client not canceling the request. The CMS times out on the GQL endpoint as it times out connecting to Mongo with the error:

(node:30) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 topologyDescriptionChanged listeners added. Use emitter.setMaxListeners() to increase limit
gregorskii commented 4 years ago

Looks like mongodb fixed this in https://github.com/mongodb/node-mongodb-native/releases/tag/v3.3.4

Going to add a bug to Mongoose to update.

See https://github.com/Automattic/mongoose/issues/8334

miladdev85 commented 4 years ago

Same issue here. Do you know how to get it back working?

I have tried deleting allowed IPs from MongoDb Atlas and re-adding them to drop all connections but still have this issue.

gregorskii commented 4 years ago

Hi all mongoose has released an update to incorporate the mongodb patch. Can we get an update going for strapi.

Thanks!

derrickmehaffy commented 4 years ago

@gregorskii do feel free to submit a PR 😉

gregorskii commented 4 years ago

I don't quite understand the versioning, which branch would I submit to? How do I run the unit tests?

In theory it just needs this version:

https://github.com/strapi/strapi/blob/master/packages/strapi-hook-mongoose/package.json#L19

Bumped:

"mongoose": "5.7.11",
gregorskii commented 4 years ago

Hi is there any info on how I can do this, or whether someone on the team can up the version?

Thanks!

iicdii commented 4 years ago

@gregorskii In versions higher than v5.7.4, mongoose has problems with groups. related issue is #4322 Since in branch features/dynamic-zones there was an update to mongoose v5.8.0(243ed2939bcc5144bf88e8d0bbc255433b8c8471), I think we better to wait until next version release.

gregorskii commented 4 years ago

I see. Is there a timeline for that fix or is it just waiting on a mongoose issue?

In the meantime my site crashes every week or so if the servers are not reset. So it’s kind of a blocker :/

lauriejim commented 4 years ago

We release this feature. The mongoose version has been updated.

gregorskii commented 4 years ago

Thank you!