Instrumented code a lot slower

KavenBreton-eaton commented 1 month ago

We observed that the exact same binaries are executing a lot slower after they have been instrumented statically. At first, we thought it was because our log levels were all Verbose but setting them to Errors didn't do any good. Any thought on what is slowing everything down at the point that a Windows service can take more than a minute to start?

Here is our runsettings file

<?xml version="1.0" encoding="utf-8"?>
<!-- File name extension must be .runsettings -->
<RunSettings>
  <DataCollectionRunSettings>
    <DataCollectors>
      <DataCollector friendlyName="Code Coverage" uri="datacollector://Microsoft/CodeCoverage/2.0" assemblyQualifiedName="Microsoft.VisualStudio.Coverage.DynamicCoverageDataCollector, Microsoft.VisualStudio.TraceCollector, Version=11.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a">
        <Configuration>
          <CoverageLogLevel>Errors</CoverageLogLevel>
          <InstrumentationLogLevel>Errors</InstrumentationLogLevel>
          <ManagedVanguardLogLevel>Error</ManagedVanguardLogLevel>
          <CoverageFileLogPath>.\coverage.log</CoverageFileLogPath>
          <CoverageFileName>.\coverage</CoverageFileName>
          <Format>coverage</Format>

          <CodeCoverage>
            <SymbolSearchPaths>
              <Path>.\bin\Release\pdb</Path>
            </SymbolSearchPaths>
            <AllowedUsers>
              <User>LOCAL SERVICE</User>
              <User>Local System</User>
              <User>SYSTEM</User>
              <User>Administrator</User>
            </AllowedUsers>
            <CollectFromChildProcesses>True</CollectFromChildProcesses>
            <AllowLowIntegrityProcesses>True</AllowLowIntegrityProcesses>
            <SymbolsRestrictOriginalPathAccess>True</SymbolsRestrictOriginalPathAccess>
            <SymbolsRestrictReferencePathAccess>True</SymbolsRestrictReferencePathAccess>
            <EnableDynamicNativeInstrumentation>False</EnableDynamicNativeInstrumentation>
            <EnableStaticNativeInstrumentation>False</EnableStaticNativeInstrumentation>
            <EnableStaticNativeInstrumentationRestore>False</EnableStaticNativeInstrumentationRestore>
            <EnableDynamicManagedInstrumentation>False</EnableDynamicManagedInstrumentation>
            <EnableStaticManagedInstrumentation>True</EnableStaticManagedInstrumentation>
            <EnableStaticManagedInstrumentationRestore>False</EnableStaticManagedInstrumentationRestore>
            <LogLevel>Errors</LogLevel>
          </CodeCoverage>
        </Configuration>
      </DataCollector>
    </DataCollectors>
  </DataCollectionRunSettings>
</RunSettings>

fhnaseer commented 1 month ago

Can you please share size and blocks of your assembly? How much is the difference between normal run and instrumented run? Also, can you please share which version of dotnet-coverage you are using?

KavenBreton-eaton commented 1 month ago

Size and blocks of assembly: How do I measure that? Difference between runs: Well, when not instrumented, we don't hit the timeout when starting a Windows service, so under 1 minute. With the instrumented one, we have to increase the timeout manually on our VMs to 2 minutes for them to start. dotnet-coverage version: We are using the Microsoft.CodeCoverage.Console provided on the windows-latest GitHub runner to instrument the C# dlls with an Id

fhnaseer commented 1 month ago

You can check version using Microsoft.CodeCoverage.Console.exe --version. Can you please verify if Windows service and collect process are running with same user and same elevation? What are the contents of coverage file once coverage collection is complete? Does it contain modules executed by the service? Or is it empty (zero modules)? Or zero coverage modules present but zero coverage)?

KavenBreton-eaton commented 1 month ago

I just modified our workflow to output this value and will let you know. I am pretty sure version is 17.11 though because it is the one shipped with VS 2022. Windows service elevation: Service is running as Local System but the coverage is launched as a local user on the VM. Coverage content: Coverage content is as expected and contains no zero coverage modules.

fhnaseer commented 1 month ago

Can you try running service with local user as well? If yes, can you check if there any performance gains from this?

KavenBreton-eaton commented 1 month ago

We did and it doesn't change anything.

jakubch1 commented 1 month ago

@KavenBreton-eaton could you please describe your commands and order? I mean in what order you execute instrument, collect and start of the service and what parameters you use. I have suspect that maybe instrumented code can't connect to collector process and during initialization and it is timeouting and retrying.

KavenBreton-eaton commented 1 month ago

Using the windows-latest Github runner, we instrument every file having an associated pdb file using the -instrument command. We do so in a PowerShell script called from the workflow. It simple iterates over files in the build folder and instrument them one after the other. We then create our installer using those instrumented files.

Note that for the processes to be slower, we don't even need to start the collection. But the collection is also started via a PowerShell script on a Windows VM using the -collect command and the same .runsettings.

jakubch1 commented 1 month ago

Please make sure you instrument only files that you need. For example you should skip all System.*.dll. This could make it slower. Also is collection started before the service itself?

KavenBreton-eaton commented 1 month ago

We are already instrumenting the files we are building as part of our solution only. As far as the collection start, as I said earlier, the slow execution we are seeing is present even without any collection.

KavenBreton-eaton commented 1 month ago

Is there anything else we can provide to help diagnose the issue?

jakubch1 commented 1 month ago

When instrumented module is loaded it tries to connect to collection process - using pipes. So when collection is not present or not started yet we can see behavior that you see slow initialization/start up. So you need to make sure that collection is up and running before your process. Is it possible to see some yaml files of your pipeline or to get some repro we can try locally?

KavenBreton-eaton commented 1 month ago

You are right. When we start the collection, it goes much faster. Would there be a solution to make it faster without starting it? Otherwise, we will just start a dummy collection during that time.

KavenBreton-eaton commented 3 weeks ago

Can we get an answer on if there is a possible solution rather than always having the collection already started even if we don't need it?

fhnaseer commented 2 weeks ago

Sorry for the late response. We will have a look for this scenario, but unfortunately it might not happen anytime soon. If dummy collection is working fine with you, I would suggest going ahead with this till we come up with a solution.

microsoft / codecoverage

Instrumented code a lot slower #137