hexawyz / NetUnicodeInfo

Unicode Character Inspector & Library providing a subset of the Unicode data for .NET clients.
https://www.nuget.org/packages/UnicodeInformation/
MIT License
59 stars 11 forks source link
c-sharp csharp library net netstandard netstandard11 ucd unicode unicode-character-database unicode-character-inspector unicode-characters unicode-data

.NET Unicode Information Library

Build Status

Summary

This project consists of a library that provides access to some of the data contained in the Unicode Character Database.

Version of Unicode supported

Unicode 13.0 Emoji 13.0

Breaking changes from versions 1.x to 2.x

UnicodeRadicalStrokeCount.StrokeCount is now of type System.SByte instead of type System.Byte.

Using the library

Reference the NuGet package

Grab the latest version of the package on NuGet: https://www.nuget.org/packages/UnicodeInformation/. Once the library is installed in your project, you will find everything you need in the System.Unicode namespace.

Basic information

Everything provided by the library will be under the namespace System.Unicode. XML documentation should be complete enough so that you can navigate the API without getting lost.

In its current state, the project is written in C# 7.3, compilable by Roslyn, and targets both .NET Standard 2.0 and .NET Standard 1.1. The library UnicodeInformation includes a (large) subset of the official Unicode Character Database stored in a custom file format.

Example usage

The following program will display informations on a few characters:

using System;
using System.Text;
using System.Unicode;

namespace Example
{
    internal static class Program
    {
        private static void Main()
        {
            Console.OutputEncoding = Encoding.Unicode;
            PrintCodePointInfo('A');
            PrintCodePointInfo('∞');
            PrintCodePointInfo(0x1F600);
        }

        private static void PrintCodePointInfo(int codePoint)
        {
            var charInfo = UnicodeInfo.GetCharInfo(codePoint);
            Console.WriteLine(UnicodeInfo.GetDisplayText(charInfo));
            Console.WriteLine("U+" + codePoint.ToString("X4"));
            Console.WriteLine(charInfo.Name ?? charInfo.OldName);
            Console.WriteLine(charInfo.Category);
        }
    }
}

Explanations:

Included Properties

From UCD

NB: The UCD property ISO_Comment will never be included since this one is empty in all new Unicode versions.

From Unicode Emoji

From Unihan

Regenerating the data

The project UnicodeInformation.Builder takes cares of generating a file named ucd.dat. This file contains Unicode data compressed by .NET's deflate algorithm, and should be included in UnicodeInformation.dll at compilation.