hexawyz / NetUnicodeInfo

Unicode Character Inspector & Library providing a subset of the Unicode data for .NET clients.
https://www.nuget.org/packages/UnicodeInformation/
MIT License
59 stars 11 forks source link

First call takes over 20 seconds on .NET 5 and above #8

Closed bzaar closed 1 year ago

bzaar commented 1 year ago

Repro:

using System;
using System.Unicode;
using System.Diagnostics;

public class Program
{
    public static void Main()
    {
        var stopwatch = Stopwatch.StartNew();
        UnicodeInfo.GetCharInfo('a');
        Console.WriteLine(stopwatch.ElapsedMilliseconds);
    }
}

.csproj:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <TargetFramework>net48</TargetFramework>
    <AppendTargetFrameworkToOutputPath>false</AppendTargetFrameworkToOutputPath>
    <OutputType>Exe</OutputType>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="UnicodeInformation" Version="2.6.0" />
  </ItemGroup>

</Project>

The above code prints numbers circa 300 on my machine but when I change the <TargetFramework> to net5.0, the numbers vary from 7000 to 28000 ms.

hexawyz commented 1 year ago

This seems a bit weird, but could be related to changes in the GZip implementation in recent versions. The tests are already running on .NET 5 and I didn't observe such a long startup delay. (Checked here: about 1.5s here on Core i5-10600 for the first test, which is already quite high, but far from what you are observing)

Did you make sure to run your test in Release mode and outside of VS ?

Silverdimond commented 1 year ago

Running a repro in release mode still shows a long delay

using System.Diagnostics;
using System.Unicode;

var stopwatch = Stopwatch.StartNew();
UnicodeInfo.GetCharInfo('a');
Console.WriteLine(stopwatch.ElapsedMilliseconds);
<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net7.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="UnicodeInformation" Version="2.6.0" />
  </ItemGroup>

</Project>

around 11-12 seconds on net7.0 (7.0.5), Windows 11 22H2 (22621.1555), Core i7-12700k The medium on which the repro's binaries are stored does not seem to affect the results (SSD and HDD have comparable results)

hexawyz commented 1 year ago

I was able to reproduce it using a laptop computer, but the performance variations between CPUs are really weird.

In order to fix this, I tried using an intermediate MemoryStream while reading the database. It seems to accelerate the startup very significantly, at the cost of increased memory usage. This seems to confirm a problem with the new DeflateStream implementation. 😕

The long term goal is to refactor the use of GZip for data compression and have static (uncompressed) data structure instead, but this fix should do it for now: https://github.com/GoldenCrystal/NetUnicodeInfo/commit/6a7f8cb624b1e4c4101dcaaa30ae21a57f3122b9