Replace file encodings - Githubissues

NEW: FileEncoding.IsTextualData utility can effectively distinguish between binary and textual data.

Code coverage: 100% (this really makes the point of how critical data coverage is as well as code coverage. :)

This change pivots our textual data detection mechanism. What we do now is attempt to decode as UTF8, Windows 1252 and UTF32. If any of these attempts results in a Unicode replacement character or observes an embedded NUL (after the first character, where a textual BOM might generate one), we classify the data as binary.

Otherwise it's text.

This utility is running against a 3M file test data set. It is looking generally effective. I am concerned about performance, we need to look at this closely.

microsoft / sarif-sdk

Replace file encodings #2741