microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
160.88k stars 28.23k forks source link

Setting service should return default value when json scheme is invalid #125422

Open lramos15 opened 3 years ago

lramos15 commented 3 years ago

To protect against invalid values the setting service should provide some sort of sanitization. Reading back a broken user setting can cause unexpected behavior. This is particularly evident in the rare cases where we update a setting in insiders and it syncs to stable causing unrecoverable exceptions.

My proposal is to sanitize user input to ensure that the expected type is what is in the user's settings.json if not then we return the default value.

/cc @bpasero As this relates to our discussion earlier. Feel free to add anything

bpasero commented 3 years ago

cc @sandy081, if possible the IConfigurationService should return the next valid value (up to the default), if the configured setting does not match the JSON schema.

In this particular case, Logan changed a setting from an array to a object and a lot of users got issues when syncing the same setting between stable and insiders.

sandy081 commented 3 years ago

Yeah, I got to know about this stable-insiders sync issue that break backward compatibility. Regarding validating the value by schema, how deep and broad you want to go as the JSON schema can go deeper (objects) and broader (multiple types, enums etc). It would be nice if we can reuse JSON language server instead of rewriting. Also how much cost it gonna add and impact performance. Is not it easy for the consumer to validate the value and go up the chain?

lramos15 commented 3 years ago

I don't think you would have to go very broad, but don't we already do JSON schema validation when we show the warnings and errors in the settings.json? The consumer could of course do the validation but it might be good to have a catch all inside the configuration service to at least ensure we are returning the default value they specify in the read if the value is incorrect. I don't think many think to do any validation. From a performance impact my guess would be not much since it's a very small data set and could be optimized easily. I go back and forth if it would be worth it, but I felt like the discussion was important to have as it would lead to more error resilient code.

Tyriar commented 3 years ago

Some concerns/thoughts:

lramos15 commented 3 years ago

@Tyriar I apologize if I misspoke during standup. The intention is to do validation on read and not modify what is synced or what is contained in the user file.

Sometimes "invalid" settings values are still supported in code, for example a setting may have changed from boolean to an enum but still accept boolean values by design.

This is true and would pose a challenge. I would assume there would be an option to read a raw unvalidated setting if you wish. I found aggressively converting everyone's setting on startup / on write to the settings.json to be much more user friendly than supporting and allowing the user to have both formats. This has the additional benefit of only needing to support an old format for a few iterations because we can guarantee all users who have updated VS Code in the past few months have the new setting.

sandy081 commented 3 years ago

Regarding settings sync I agree with @Tyriar comment.

Regarding configuration service returning the valid values always:

I don't think you would have to go very broad, but don't we already do JSON schema validation when we show the warnings and errors in the settings.json?

JSON schema validation is done by JSON language server which is an extension. Configuration service just provides the schema for JSON extension to validate. So it does not have access to JSON language server and so cannot reuse it in core.

The consumer could of course do the validation but it might be good to have a catch all inside the configuration service to at least ensure we are returning the default value they specify in the read if the value is incorrect.

@Tyriar made a good point that some times consumers accept invalid values and do migration. I would expect configuration service return what is there in the settings file by default and it is up to the consumer to check the validity of it.

From a performance impact my guess would be not much since it's a very small data set and could be optimized easily.

I think it does impact perf. If the underlying model validates the value then it impacts the start up perf. Configuration service is the very first service we instantiate and initialize as the complete workbench depends on it. Introducing validation will definitely add cost. If we do validation, we have to do it completely and in all settings.json files which is expensive for sure.

If we validate only the value that is being asked, I would think it is still expensive as workbench is heavy consumer of configuration service and is called large number of times and sometimes multiple times with same key. Also note that configuration service can return the complete settings tree if somebody does not provide any section/key in getValue. Another scenario is that editor ask for complete editor settings very often. I feel this is going to be a big change to do it efficiently.

May I know if this ripples down to extensions too? If so are not they broken now? How can they read the value that is there in the file?

So, I am not sure I would go in this direction. But I see a requirement here for those consumers who would like to validate value. For this, I would suggest to introduce another API / Service which can validate the value as per schema so that consumers can check before applying.

One thing which I have been thinking to do w.r.to following API in IConfigurationService

https://github.com/microsoft/vscode/blob/ab6a005750651e6132d54aabae2d2f2469a593aa/src/vs/platform/configuration/common/configuration.ts#L97-L108

Typing is bogus here. So I would like to remove the typing and just change the return type to any | undefined | null since IConfigurationService cannot guarantee on the value type and shape. It is up to the caller to type it or validate it etc.,

bpasero commented 3 years ago

Well actually the right thing would be to return unknown and force the caller to assert the type, but that is a big big adoption. Maybe we could offer helpers for the simple cases that are primitives.

lramos15 commented 3 years ago

Thanks for the detailed responses. I think I agree with you that this isn't worth pursuing beyond a typings / expectation change. I was only thinking in the context of validating a single setting entry and if we can't access the JSON language server our own validation logic wouldn't be worth writing.

I would say the biggest issue is what @sandy081 pointed out regarding the typings, they don't mean anything and therefore give the false sense of security that your returned value is type safe. Changing the typings (I do like @bpasero's idea of unknown but that might be a much larger work item) and adding a comment stating there no validation should be enough to at least notify the consumer that validation is required.

Tyriar commented 3 years ago

Well actually the right thing would be to return unknown and force the caller to assert the type, but that is a big big adoption. Maybe we could offer helpers for the simple cases that are primitives.

I've started doing this with terminal profiles because otherwise runtime exceptions happen. 👍 to validation helpers, using is in the function return makes this really nice:

https://github.com/Microsoft/vscode/blob/0f8c499d1014b9079f01a63c097cc2354400e9ed/src/vs/workbench/contrib/terminal/browser/terminalProfileResolverService.ts#L376-L394

Maybe we could even have a general function to validate the schema which would for the most part save needing to do what I'm doing above:

function matchesSchema(obj: unknown, schema: IJSONSchema): obj is IJSONSchema {
  // Check if primitive, recurse object props, etc.
}

EDIT: Actually the return type there wouldn't work, it would probably need to be generic and use obj is T or something.

lramos15 commented 3 years ago

Switching to unknown produces ~850 errors. They're fairly simple adoptions and fortify our codebase but it is something that would take time. If we think it's worth the time to force a validation mindset then I can start an engineering item, but it might not be worth our effort.

Something like matchesSchema<T>(obj: unknown, schema: IJSONSchema): obj is T would be interesting as well to explore, although we would have to write our own JSON validation logic in that case.

sandy081 commented 3 years ago

IMO since the API declaration is not truthful and reliable, it is good that we change it. I agree that it is a huge task but it is worth and make our code base rock solid and handles cases when settings values are invalid.

I will bring it up in our stand up and see what other team members feel about it.

lramos15 commented 3 years ago

After discussion with Kai, he also agrees this would be a worthwhile investment. Since it's a templated function we can do this iteratively by templating it as unknown i.e. this.configurationService.getValue<unknown>('foo'); that way when we flick the switch to unknown we would just remove the templating.

bpasero commented 8 months ago

cc @benibenj who is working on validation in https://github.com/microsoft/vscode/pull/197242 for editor options.

After a sync with @sandy081 here is an idea: