Closed sprak3000 closed 1 year ago
Based on other reports we've seen, the most likely explanation is that another dependency is interfering. I would look first at any kind of middleware product that intercepts HTTP requests. For instance, some versions of Google Cloud Trace are known to have this issue. Briefly, the problem (if this is the same problem) is when a third-party library monkey-patches http.request
or https.request
, but does not always correctly pass the original parameters through to the original Node function, a mistake that quite a few developers have made because those functions support several different parameter types.
One thing that is surprising to me, if that is the explanation, is that you're not seeing any errors logged— the other reports produced visible errors. But it's possible that different conflicting libraries have different failure modes.
Oh, I forgot to say: one reason that I immediately suspect a problem of this kind is that the v6.2.2 release was specifically about changing the parameters that our code passes to https.request
. This was originally motivated by a different third-party incompatibility as described here— but one where the other code was behaving more correctly than ours was. Unfortunately, fixing that in v6.2.2 revealed flaws in other products that happened to be incompatible with the correct parameters, and whenever we've had reports of any kind of problem with network requests that only showed up in 6.2.2, it's turned out to be that.
@eli-darkly
🤔 We are using a Google library. I'll start there and see if anything shakes loose. Generally, what parameters might I look for to see if they are missing?
Generally, what parameters might I look for to see if they are missing?
I can't give a very specific answer to that, because there are so many ways a monkey-patching of https.request
could have been implemented.
Basically what you would be looking for is... if you take a look at how Node defines possible parameters to https.request
here, any code that is intercepting that call has to allow for any of those possible parameter combinations, and correctly pass them through to the real Node request
. So it has to work for just (options)
, or for (options, callback)
, or for (url)
, or for (url, options)
, or for (url, callback)
, or for (url, options, callback)
; and, it has to allow for url
being either a string or a URL
, and for options
being either an object or a string or a URL
.
So for instance, if they had defined their interceptor to take only (options, callback)
or (url, callback)
, and they were then calling https.request
with those same two parameters in all cases, they would break the LaunchDarkly SDK because in some cases we need to pass (url, options, callback)
.
Also, I meant to mention that the specific Google library we know to be problematic is @google-cloud/trace-agent
— and more specifically, versions earlier than 4.0.0, as reported here. So if you're using that, I'd say that's your answer.
@eli-darkly
We weren't using that Google library, and it wasn't the issue. Still looking into it, but your explanation is exceedingly helpful. I have a better picture now of what to look for through our own code and what we bring in. Much appreciated!
We have the same issue here with Raygun http https://www.npmjs.com/package/raygun-apm
When running with node -r raygun-apm/http
it hangs and waitForInitialization
never resolves.
It works with sdk 6.2.1
but fails with any more recent version.
I'd like to emphasize that if anyone is seeing an error in the general category of "when I updated the LaunchDarkly Node SDK from 6.2.1 to 6.2.2, it started failing"— especially if that failure is "initialization hangs", and especially if there is any accompanying error message along the lines of "The 'listener' argument must be of type function. Received an instance of Object", when it tries to do an HTTP request— it is just about certain that the issue is a badly behaved third-party HTTP middleware package. The reason we're able to be so confident about this is 1. we have investigated many such reports, and 2. the changes from 6.2.1 to 6.2.2 were very localized and basically just boil down to us calling https.request
with different parameters than before— valid parameters, but the difference is likely to bring out any problems that exist in HTTP middleware. So if you are seeing such issues, we strongly advise that you take a close look at your dependencies, and for any kind of middleware/metrics/tracing package that could possibly affect HTTP requests, make sure you are using an up-to-date version and make sure there are no known incompatibilities with other packages.
In the most recent report we had like this, the problem turned out to be that the customer was using an obsolete version of the New Relic agent that used an obsolete version of agent-base
along with a version of the AppDynamics agent that did not support that version of agent-base
. We were able to help them track down that problem, but it was not a LaunchDarkly problem and the customer could have avoided a very long support process by taking a closer look at their dependencies.
I don't say that to be dismissive of the possibility that some other kind of problem could happen that really is our problem, but just to raise awareness of this general issue to save everyone time in troubleshooting.
@eli-darkly Do ya'll have a shortlist of the most likely dependency culprits? I'm having a hell of a time narrowing things down in our node API codebase...
Ah -- and of course right after I ask that question I realize we are also using an old version of agent-base
through the deprecated @google/maps
package. Removing that solved the issue... I'll see myself out... 💃🏻
Looks like this issue was resolved. Closing it.
I was initialing attempting to upgrade from 6.2.1 of the SDK up to 6.4.3 but found it would hang my service after making a successful call to LD. Gradually backed down versions to find an upgrade to 6.2.2 results in the same behavior. A successful run under 6.2.1 results in these logs:
When I run under 6.2.2, I receive the first two logs saying it is initializing the stream and the successful response but do not see the opened stream connection log ever appear. The request to my service just hangs until I cancel it. I have a different service that is using 6.4.3 of the service with no issue. Only difference I can find between them is the 6.4.3 service also uses the Redis feature store from
launchdarkly-node-server-sdk-redis
. Otherwise, I can't find any difference between them causing the discrepancy in behavior.Feels like something subtle, maybe another dependency or something interfering. Hoping someone might have some insight here.