phmonte / Buildalyzer

A utility to perform design-time builds of .NET projects without having to think too hard about it.
MIT License
606 stars 95 forks source link

Encoding problems (maybe because of Encoding.Default?) #128

Closed MilleBo closed 2 years ago

MilleBo commented 4 years ago

I just tried to build/compile a solution that contains variables with åäö and it seems to fail with errors like: Unexpected character '�'

I then checked the document object and it seems like it put the document encoding to SBCSCodePageEncoding:

CanBeEmbedded: true
ChecksumAlgorithm: Sha1
Container: {Microsoft.CodeAnalysis.Text.SourceText.StaticContainer}
Encoding: {System.Text.SBCSCodePageEncoding}
Length: 8825
Lines: {Microsoft.CodeAnalysis.Text.SourceText.LineInfo}

So I looked through your source code and it seems like you set Encoding.Default (line 224):

https://github.com/daveaglick/Buildalyzer/blob/master/src/Buildalyzer.Workspaces/AnalyzerResultExtensions.cs

Should we really use Encoding.Default? It seems like Microsoft recommend not to:

https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding.default?redirectedfrom=MSDN&view=netframework-4.8#System_Text_Encoding_Default

And here is another discussion:

https://stackoverflow.com/questions/18866583/why-is-default-encoding-in-c-sharp-not-recommend

Maybe it should be changed to something else? Like UTF-8?

daveaglick commented 4 years ago

Good call - it should probably be UTF-8. I’ve blocked out some Buildalyzer time this weekend so I’ll try that change and make sure none of the integration tests break.

MilleBo commented 4 years ago

Great, thanks.

MilleBo commented 4 years ago

I actually tried to change it to UTF-8 and it doesn't seem to fix the issue so maybe it's just better to keep it as default.

daveaglick commented 4 years ago

Even if it doesn't solve this particular problem, I still think it's better not to set the default encoding now that you pointed it out. As for this problem, is it possible the file has a totally different encoding and isn't actually ASCII or UTF-8? I'd be curious if a call to SourceText.From(File.ReadAllText(yourFile)) from a test app crashes or not. If it does that suggests the problem is actually with Roslyn.

MilleBo commented 4 years ago

I checked a while ago and the file was actually some kind of ANSI encoding (but all other files in the solution was UTF-8). So the only solution that actually worked was to check the encoding for each specific file and use that when reading.

daveaglick commented 4 years ago

So this is likely an error caused not by the presence of unexpected characters, but rather the file being encoded with something other than the specified encoding. I think we can resolve it by using something like StreamReader.CurrentEncoding since we're loading the SourceText from a known file.

daveaglick commented 2 years ago

Just to circle back, we actually do need to set that encoding - the workspace won't compile if someone tries to without an encoding set for the SourceText. But Encoding.Default was also wrong. The key here is that we're loading the SourceText from a string which is always Encoding.Unicode so we can use that.