microsoft / sarif-visualstudio-extension

SARIF Microsoft Visual Studio Viewer Extension
MIT License
46 stars 19 forks source link

Viewer enumerating too much of the file system (not interpreting Artifact Location Base URI) #419

Open GabeDeBacker opened 3 years ago

GabeDeBacker commented 3 years ago

The SARIF viewer is only looking for the "baseUriId" in the "physicalLocation" property and is not correctly reading the "baseUriId" given in an artifact location index.

As a result, when looking for a relative path in a result, it begins searching the solution directory for the relative path using the TryResolveFilePathFromSolution method. The TryResolveFilePathFromSolution method also has an issue where it takes a substring of the solution path looking for the last index of the path separator.

So...

A solution path of "c:\a\b\c" ends up scanning "c:\a\b" which is actually one up directory up from the solution directory.

This can result in an very large enumeration looking for the file path.

eddynaka commented 3 years ago

@yongyan-gh , can you verify?

michaelcfanning commented 3 years ago

Ouch. @eddynaka, @yongyan-gh.

VS should make a comprehensive attempt to complete resolve a URI for any result. This comprehensive resolution ideally would be delivered by an API that ships in the SDK. A problem here is that this API would require access to the complete log file, meaning that we'd need to precompute the resolved path for every result as we add each error list item.

For VS, we might want to defer that computation but this could entail persisting additional information to each error list item (such as original uri base ids).

Anyway, I haven't dipped into this area in a while and someone needs to look around. We need a fix for this issue, and we need an API in the SARIF SDK if one does not exist. Finally, Gabe has pointed out that there is the possibility of some confusion/inconsistency in the SARIF format itself (for example, if a base uri hanging directly off a physical location differs from the base uri indicated in the matching artifacts table). We need validations for this inconsistency.

For completeness, Eddy, any ambiguity in resolving the 'actual' URI may need to be resolved in an erratum in the SARIF spec, if that text is unclear.

michaelcfanning commented 3 years ago

OK, so based on a change Gabe sent me, our error list model retains a copy of the Run object. That means that we should be able to call an SDK helper that definitively resolves the URI, if it can be resolved to an absolute location. i.e., please fix this issue in the SDK. You can look at other API extensions that attempt to retrieve the associated rule/reporting descriptor from a result as a model. This look-up is also complex and requires a run object instance to complete.

yongyan-gh commented 3 years ago

Sure will look into SARIF SDK functionality to resolve a URI.

@GabeDeBacker regarding issue of TryResolveFilePathFromSolution, it take exact path of .sln file as root path to search. I cannot repo the issue, can you pls share repo steps/sample result can repo if you have?

Maybe in below line, the variable name is confusing, its actually a full path of .sln file e.g. "C:\SarifLog\SensitiveTerms\solution1\TermsTests1.sln", the function tries to remove the file name to get directory path. Will rename it properly and update this using System.IO.Path.GetDirectoryName instead.

https://github.com/microsoft/sarif-visualstudio-extension/blob/2c2ed72309c20dc17888ea202d02517c59beeac9/src/Sarif.Viewer.VisualStudio/CodeAnalysisResultManager.cs#L706

yongyan-gh commented 3 years ago

@GabeDeBacker just saw your PR also addressed the .sln path issue. Thanks looking at the PR