Closed maxwellgerber closed 4 years ago
The required interface for logger objects is already spelled out in the SDK's TypeScript-based API documentation for LDLogger.
If you're not using TypeScript, then it's not checking for compliance to that interface, and you're right that we could implement some validation logic to do it programmatically. But it's hard to say how far to go with that kind of thing— e.g. just because an object has a warn
property does not mean it's a function that takes the parameters we expect it to take. Since there's already a popular mechanism for providing something resembling type-safety in JavaScript, and that is TypeScript, we've generally not tried to do runtime type checking on everything that is passed to us, but have been careful to keep our TypeScript definitions up to date. Those definitions, and the HTML docs that are generated from them, should be considered the source of truth if you're ever not sure what the supported types or properties for something are.
Now, we don't have an absolute rule of not doing safety checks: as you can see in configuration.js, we still have implemented runtime type checking to some extent, specifically in the configuration object. However, there are three problems with applying similar logic to the logger:
variationDetail()
), or b) logged a warning and chosen some well-defined strategy for graceful degradation. So, in the current configuration.js
logic, if you pass a non-numeric value where a number is expected, you won't get an exception—it'll simply ignore that value and fall back to the default value for that property (and log a warning). If we used similar logic for this, falling back to the default would mean using a Winston console logger. Deliberately throwing an exception would be unusual.warn
property that is a function) does not really guarantee that it is safe to use as a logger. You could always have a broken logger implementation whose warn
method throws an exception, and then you'd have exactly the same problem you ran into here.Especially because of number 3, I'm thinking that it might make more sense to add an error-proofing wrapper around whatever logger we are using. It is indeed a very bad thing to have an uncaught exception happen while you're trying to handle an error, so if we do all of our logging against the wrapper and it always delegates to the underlying logger with a try block, we can at least prevent that. The question would be what to do at that point— log a warning to the console?
I don't have much experience providing a library to a large number of external clients, but it certainly sounds like y'all have spent a lot of time and effort creating a POV on sane defaults and library behavior. Thank you for the incredibly detailed writeup. You make a ton of great points I hadn't considered before.
You are correct that we aren't using TypeScript. Up until this point, we've been getting away with just the SDK reference. I now see that it calls out the TypeScript generated docs in the beginning.
There are a ton of fun things that make this problem interesting. Consider the following more of a ramble than anything else.
reported the problem using some special return value if possible
It's hard to see where the return value could come from - considering setting logging isn't specific to any call but the constructor. I suppose the init
method could return a wrapped { LDClient, errors[] }
object but that would be a breaking change and a pretty ugly API (IMHO).
In general we have tried pretty hard to avoid using exceptions as an error-reporting mechanism
There are two places where the library throws errors today: when no SDK Key is passed in and when the FileDataSource is given an invalid file. Both of these are errors on startup - not during operation. To borrow the K8s terminology, throwing errors in a readiness check is preferable to failing a liveness check later on. Subjectively, I as a developer would want my code to fail fast - error out once or twice when setting up LD locally, and be rock solid in production.
it'll simply ignore that value and fall back to the default value for that property (and log a warning)
Having bad config values be replaced with sane defaults is also desirable. I've liked that same strategy with other infra tools I've used(AWS's DB Parameter Groups being the first example I can think of). However - I think loggers are executable code, not config. More along the lines of dependency injection than anything else. What would the behavior be if a caller provided a featureStore
that didn't implement all the required methods? Most of them are called on startup, I think. I imagine an error would be thrown pretty quickly.
Calling all 4 log levels on startup would produce a ton of noise. I don't think that is a good approach.
Another major issue is dropping logs. Falling back to logging to the console as a first step is dangerous because it's very easy for those logs to get lost - or be only available on the production servers and not in whatever log aggregator someone might be using.
I'm thinking that it might make more sense to add an error-proofing wrapper around whatever logger we are using.
I like this idea too - and you make an excellent point about the error-handling - but I don't see any reason we can't have both. Fail fast if the logger passed in obviously won't work, and be cautious when calling into it at all other times.
I think a log instance should be thought of as dependency injection, not config - and should be approached the same way featureStores are approached. I like the idea of having an error-handling adapter, but I don't think it solves the undefined method case very well. I also don't know what to do when the logger throws an error.
Let me know if you'd rather see the error handler approach, or the error handler + config validation approach. I'll update the PR accordingly.
Thanks!
@gkop I'd love to hear your thoughts on this too
Those are all good points. And because the first case you mentioned here, which I had forgotten about—
There are two places where the library throws errors today: when no SDK Key is passed in
—is part of ld.init()
, and the configuration validation is only called from within init()
, we would not really be defining a brand new behavior of init()
being able to throw an exception. We would just be extending the category of "parameters that are so invalid we can't proceed at all" to include a malformed logger object as well as a missing SDK key. And since the logger is not just heavily used throughout the SDK, but specifically necessary for reporting errors, that is probably reasonable.
So, I think it'd make sense to 1. throw an exception from config validation if the logger does not have all four methods, and 2. also have the config validation code wrap any custom logger that's provided by the app in a delegator that swallows exceptions. There wouldn't be much we could do about an exception at that point except to write something to console.log()
, but at least it wouldn't kill the app, and even if the console output wasn't noticeable, you might at least notice that you were not getting log output.
About other configurable components, I'm less convinced that we should add extra validation. You mentioned the feature store; there are definitely some methods there that are not called during startup, so if you completely omitted them, an error would only show up at some later time. But that gets back to not being able to prove that any implementation, even if it has methods with the right names, isn't broken. To some degree, if you choose to provide a custom component, you are taking on the responsibility of making sure it is usable and that all of its methods have been tested before you put it into production. Also, the logger is the only case where we have the specific issue of a broken component also breaking the error-reporting mechanism.
I think this looks good. I'm going to merge it into an alternate branch rather than directly onto master in case we still need to adjust anything, but it should go into the next release in some form. Thanks!
Requirements
Related issues
During LaunchDarkly's April 21 incident, our sdk (5.10.3) reached the following code:
https://github.com/launchdarkly/node-server-sdk/blob/f95665c68c37bc81d47274526ba4abac8ed0bcbe/streaming.js#L33-L34
This was the first time something in the ld library had called
logger.warn
.Our logger calls this method
logger.warning
The result:
An error being thrown in an EventEmitter's error handler causes the process to exit. Try running:
All our services restarting at the same time, across all our environments, was very exciting, but I would really like to not have it happen again.
Describe the solution you've provided
I think it makes sense for
configuration.validate
to throw an error when passed a logger that doesn't support all the log levels LD uses internally. This way, mismatched logging APIs will be caught way earlier, before code makes it into production.It would also be great to update the docs: https://github.com/launchdarkly/LaunchDarkly-Docs/pull/38
Describe alternatives you've considered
N/A
Additional context
N/A