Closed MilleBo closed 2 years ago
Good call - it should probably be UTF-8. I’ve blocked out some Buildalyzer time this weekend so I’ll try that change and make sure none of the integration tests break.
Great, thanks.
I actually tried to change it to UTF-8 and it doesn't seem to fix the issue so maybe it's just better to keep it as default.
Even if it doesn't solve this particular problem, I still think it's better not to set the default encoding now that you pointed it out. As for this problem, is it possible the file has a totally different encoding and isn't actually ASCII or UTF-8? I'd be curious if a call to SourceText.From(File.ReadAllText(yourFile))
from a test app crashes or not. If it does that suggests the problem is actually with Roslyn.
I checked a while ago and the file was actually some kind of ANSI encoding (but all other files in the solution was UTF-8). So the only solution that actually worked was to check the encoding for each specific file and use that when reading.
So this is likely an error caused not by the presence of unexpected characters, but rather the file being encoded with something other than the specified encoding. I think we can resolve it by using something like StreamReader.CurrentEncoding
since we're loading the SourceText
from a known file.
Just to circle back, we actually do need to set that encoding - the workspace won't compile if someone tries to without an encoding set for the SourceText
. But Encoding.Default
was also wrong. The key here is that we're loading the SourceText
from a string
which is always Encoding.Unicode
so we can use that.
I just tried to build/compile a solution that contains variables with åäö and it seems to fail with errors like:
Unexpected character '�'
I then checked the document object and it seems like it put the document encoding to SBCSCodePageEncoding:
So I looked through your source code and it seems like you set Encoding.Default (line 224):
https://github.com/daveaglick/Buildalyzer/blob/master/src/Buildalyzer.Workspaces/AnalyzerResultExtensions.cs
Should we really use Encoding.Default? It seems like Microsoft recommend not to:
https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding.default?redirectedfrom=MSDN&view=netframework-4.8#System_Text_Encoding_Default
And here is another discussion:
https://stackoverflow.com/questions/18866583/why-is-default-encoding-in-c-sharp-not-recommend
Maybe it should be changed to something else? Like UTF-8?