hadashiA / VYaml

The extra fast, low memory footprint YAML library for C#, focued on .NET and Unity.
MIT License
295 stars 16 forks source link
serialization unity yaml

VYaml

GitHub license Unity 2022.2+ NuGet openupm

VYaml is a pure C# YAML 1.2 implementation, which is extra fast, low memory footprint with focued on .NET and Unity.

The reason VYaml is fast is it handles utf8 byte sequences directly with newface api set of C# (System.Buffers.*, etc). In parsing, scalar values are pooled and no allocation occurs until Scalar.ToString(). This works with very low memory footprint and low performance overhead, in environments such as Unity.

screenshot_benchmark_dotnet.png screenshot_benchmark_unity.png

Compared with YamlDotNet (most popular yaml library in C#), basically 6x faster and about 1/50 heap allocations in some case.

Currentry supported fetures

Most recent roadmap

Installation

NuGet

Require netstandard2.1 or later.

You can install the following nuget package. https://www.nuget.org/packages/VYaml

dotnet add package VYaml

Unity

Require Unity 2021.3 or later.

Install via git url

If you are using a version of Unity newer than 2022.2, you can install as the Unity package manager at the following git URL;

https://github.com/hadashiA/VYaml.git?path=VYaml.Unity/Assets/VYaml#0.27.1

[!IMPORTANT]
If you are using Unity 2022.1 or older, the git url cannot be used as is because the source generator versions are different. Instead, install with VYaml.2022_1_or_lower.unitypackage from the Releases page.

Usage

Serialize / Deserialize

Define a struct or class to be serialized and annotate it with the [YamlObject] attribute and the partial keyword.

using VYaml.Annotations;

[YamlObject]
public partial class Sample
{
    // By default, public fields and properties are serializable.
    public string A; // public field
    public string B { get; set; } // public property
    public string C { get; private set; } // public property (private setter)
    public string D { get; init; } // public property (init-only setter)

    // use `[YamlIgnore]` to remove target of a public member
    [YamlIgnore]
    public int PublicProperty2 => PublicProperty + PublicField;
}

Why partial is necessary ?

var utf8Yaml = YamlSerializer.Serialize(new Sample
{
    A = "hello",
    B = "foo",
    C = "bar",
    D = "hoge",
});

Result:

a: hello
b: foo
c: bar
d: hoge

By default, The Serialize<T> method returns an utf8 byte array. This is because it is common for writes to files or any data stores to be stored as strings in utf8 format.

If you wish to receive the results in a C# string, do the following Note that this has the overhead of conversion to utf16.

var yamlString = YamlSerializer.SerializeToString(...);

You can also convert yaml to C#.

using var stream = File.OpenRead("/path/to/yaml");
var sample = await YamlSerializer.DeserializeAsync<Sample>(stream);

// Or 
// var yamlUtf8Bytes = System.Text.Encofing.UTF8.GetBytes("<yaml string....>");
// var sample = YamlSerializer.Deserialize<Sample>(yamlUtf8Bytes);
sample.A // #=> "hello"
sample.B // #=> "foo"
sample.C // #=> "bar"
sample.D // #=> "hoge"

Built-in supported types

These types can be serialized by default:

TODO: We plan add more.

Deserialize as dynamic

You can also deserialize into primitive object type implicitly.

var yaml = YamlSerializer.Deserialize<dynamic>(yamlUtf8Bytes);
yaml["a"] // #=> "hello"
yaml["b"] // #=> "aaa"
yaml["c"] // #=> "hoge"
yaml["d"] // #=> "ddd"

Deserialize multiple documents

YAML allows for multiple data in one file by separating them with ---. This is called a "Document". If you want to load multiple documents, you can use Yamlserializer.DeserializeMultipleDocuments<T>(...).

For example:

---
Time: 2001-11-23 15:01:42 -5
User: ed
Warning:
  This is an error message
  for the log file
---
Time: 2001-11-23 15:02:31 -5
User: ed
Warning:
  A slightly different error
  message.
---
Date: 2001-11-23 15:03:17 -5
User: ed
Fatal:
  Unknown variable "bar"
Stack:
- file: TopClass.py
  line: 23
  code: |
    x = MoreObject("345\n")
- file: MoreClass.py
  line: 58
  code: |-
    foo = bar
var documents = YamlSerializer.DeserializeMultipleDocuments<dynamic>(yaml);
documents[0]["Warning"] // #=> "This is an error message for the log file"
documents[1]["Warning"] // #=> "A slightly different error message."
documents[2]["Fatal"]   // #=> "Unknown variable \"bar\""

Naming convention

:exclamation: By default, VYaml maps C# property names in lower camel case (e.g. propertyName) format to yaml keys.

If you want to customize this behaviour, use argment of [YamlObject] attribute.

[YamlObject(NamingConvention.SnakeCase)]
public partial class Sample
{
    public int FooBar { get; init; }
}

This serialize as:

foo_bar: 100

List of possible values:

Alos, you can change the key name each members with [YamlMember("name")]

[YamlObject]
public partial class Sample
{
    [YamlMember("foo-bar-alias")]
    public int FooBar { get; init; }
}

This serialize as:

foo-bar-alias: 100

Custom constructor

VYaml supports both parameterized and parameterless constructors. The selection of the constructor follows these rules.

:note: If using a parameterized constructor, all parameter names must match corresponding member names (case-insensitive).

[YamlObject]
public partial class Person
{
    public int Age { get; } 
    public string Name { get; }

    // You can use a parameterized constructor - parameter names must match corresponding members name (case-insensitive)
    public Person(int age, string name)
    {
        Age = age;
        Name = name;
    }
}

[YamlObject]
public partial class Person
{
    public int Age { get; set; }
    public string Name { get; set; }

    public Person()
    {
        // ...
    }

    // If there are multiple constructors, then [YamlConstructor] should be used
    [YamlConstructor]
    public Person(int age, string name)
    {
        this.Age = age;
        this.Name = name;
    }
}

[YamlObject]
public partial class Person
{
    public int Age { get; } // from constructor
    public string Name { get; } // from constructor
    public string Profile { get; set; } // from setter

    // If all members of the construct are not taken as arguments, setters are used for the other members
    public Person3(int age, string name)
    {
        this.Age = age;
        this.Name = name;
    }
}

Enum

By default, Enum is serialized in camelCase with a leading lowercase letter, as is the key name of the object. For example:

enum Foo
{
    Item1,
    Item2,
    Item3,
}
YamlSerializer.Serialize(Foo.Item1); // #=> "item1"

It respect [EnumMember], and [DataMember].

enum Foo
{
    [EnumMember(Value = "item1-alias")]
    Item1,

    [EnumMember(Value = "item2-alias")]
    Item2,

    [EnumMember(Value = "item3-alias")]
    Item3,
}
YamlSerializer.Serialize(Foo.Item1); // #=> "item1-alias"

And, naming covnention can also be specified by using the [YamlMember] attribute.

[YamlObject(NamingConvention.SnakeCase)]
enum Foo
{
    ItemOne,
    ItemTwo,
    ItemThree,
}
YamlSerializer.Serialize(Foo.ItemOne); // #=> "item_one"

Polymorphism (Union)

VYaml supports deserialize interface or abstract class objects for. In VYaml this feature is called Union. Only interfaces and abstracts classes are allowed to be annotated with [YamlObjectUnion] attributes. Unique union tags are required.

[YamlObject]
[YamlObjectUnion("!foo", typeof(FooClass))]
[YamlObjectUnion("!bar", typeof(BarClass))]
public partial interface IUnionSample
{
}

[YamlObject]
public partial class FooClass : IUnionSample
{
    public int A { get; set; }
}

[YamlObject]
public partial class BarClass : IUnionSample
{
    public string? B { get; set; }
}
// We can deserialize as interface type.
var obj = YamlSerializer.Deserialize<IUnionSample>(UTF8.GetBytes("!foo { a: 100 }"));

obj.GetType(); // #=> FooClass

In the abobe example, The !foo and !bar are called tag in the YAML specification. YAML can mark arbitrary data in this way, and VYaml Union takes advantage of this.

You can also serialize:

YamlSerializer.Serialize<IUnionSample>(new FooClass { A = 100 });

Result:

!foo
a: 100

Customize serialization behaviour

To perform Serialize/Deserialize, it need an IYamlFormatter<T> corresponding to a certain C# type.
By default, the following StandardResolver works and identifies IYamlFormatter.

You can customize this behavior as follows:

var options = new YamlSerializerOptions
{
    Resolver = CompositeResolver.Create(
        new IYamlFormatter[]
        {
            new YourCustomFormatter1(), // You can add additional formatter
        },
        new IYamlFormatterResolver[]
        {
            new YourCustomResolver(),  // You can add additional resolver
            StandardResolver.Instance, // Fallback to default behavior at the end.
        })
};

YamlSerializer.Deserialize<T>(yaml, options);
YamlSerializer.Deserialize<T>(yaml, options);

Low-Level API

Parser

YamlParser struct provides access to the complete meta-information of yaml.

Basic example:

var parser = YamlParser.FromBytes(utf8Bytes);

// YAML contains more than one `Document`. 
// Here we skip to before first document content.
parser.SkipAfter(ParseEventType.DocumentStart);

// Scanning...
while (parser.Read())
{
    // If the current syntax is Scalar, 
    if (parser.CurrentEventType == ParseEventType.Scalar)
    {
        var intValue = parser.GetScalarAsInt32();
        var stringValue = parser.GetScalarAsString();
        // ...

        if (parser.TryGetCurrentTag(out var tag))
        {
            // Check for the tag...
        }

        if (parser.TryGetCurrentAnchor(out var anchor))
        {
            // Check for the anchor...
        }        
    }

    // If the current syntax is Sequence (Like a list in yaml)
    else if (parser.CurrentEventType == ParseEventType.SequenceStart)
    {
        // We can check for the tag...
        // We can check for the anchor...

        parser.Read(); // Skip SequenceStart

        // Read to end of sequence
        while (!parser.End && parser.CurrentEventType != ParseEventType.SequenceEnd)
        {
             // A sequence element may be a scalar or other...
             if (parser.CurrentEventType = ParseEventType.Scalar)
             {
                 // ...
             }
             // ...
             // ...
             else
             {
                 // We can skip current element. (It could be a scalar, or alias, sequence, mapping...)
                 parser.SkipCurrentNode();
             }
        }
        parser.Read(); // Skip SequenceEnd.
    }

    // If the current syntax is Mapping (like a Dictionary in yaml)
    else if (parser.CurrentEventType == ParseEventType.MappingStart)
    {
        // We can check for the tag...
        // We can check for the anchor...

        parser.Read(); // Skip MappingStart

        // Read to end of mapping
        while (parser.CurrentEventType != ParseEventType.MappingEnd)
        {
             // After Mapping start, key and value appear alternately.

             var key = parser.ReadScalarAsString();  // if key is scalar
             var value = parser.ReadScalarAsString(); // if value is scalar

             // Or we can skip current key/value. (It could be a scalar, or alias, sequence, mapping...)
             // parser.SkipCurrentNode(); // skip key
             // parser.SkipCurrentNode(); // skip value
        }
        parser.Read(); // Skip MappingEnd.
    }

    // Alias
    else if (parser.CurrentEventType == ParseEventType.Alias)
    {
        // If Alias is used, the previous anchors must be holded somewhere.
        // In the High level Deserialize API, `YamlDeserializationContext` does exactly this. 
    }
}

See test code for more information. The above test covers various patterns for the order of ParsingEvent.

Emitter

Utf8YamlEmitter struct provides to write YAML formatted string.

Basic usage:

var buffer = new ArrayBufferWriter();
var emitter = new Utf8YamlEmitter(buffer); // It needs buffer implemented `IBufferWriter<byte>`

emitter.BeginMapping(); // Mapping is a collection like Dictionary in YAML
{
    emitter.WriteString("key1");
    emitter.WriteString("value-1");

    emitter.WriteString("key2");
    emitter.WriteInt32(222);

    emitter.WriteString("key3");
    emitter.WriteFloat(3.333f);
}
emitter.EndMapping();
// If you want to expand a string in memory, you can do this.
System.Text.Encoding.UTF8.GetString(buffer.WrittenSpan); 
key1: value-1
key2: 222
key3: 3.333

Emit string in various formats

By default, WriteString() automatically determines the format of a scalar.

Multi-line strings are automatically format as a literal scalar:

emitter.WriteString("Hello,\nWorld!\n");
|
  Hello,
  World!

Special characters contained strings are automatically quoted.

emitter.WriteString("&aaaaa ");
"&aaaaa "

Or you can specify the style explicitly:

emitter.WriteString("aaaaaaa", ScalarStyle.Literal);
|-
  aaaaaaaa

Emit sequences and other structures

e.g:

emitter.BeginSequence();
{
    emitter.BeginSequence(SequenceStyle.Flow);
    {
        emitter.WriteInt32(100);
        emitter.WriteString("&hoge");
        emitter.WriteString("bra");
    }
    emitter.EndSequence();

    emitter.BeginMapping();
    {
        emitter.WriteString("key1");
        emitter.WriteString("item1");

        emitter.WriteString("key2");
        emitter.BeginSequence();
        {
            emitter.WriteString("nested-item1")
            emitter.WriteString("nested-item2")
            emitter.BeginMapping();
            {
                emitter.WriteString("nested-key1")
                emitter.WriteInt32(100)
            }
            emitter.EndMapping();
        }
        emitter.EndSequence();
    }
    emitter.EndMapping();
}
emitter.EndMapping();
- [100, "&hoge", bra]
- key1: item1
  key2:
  - nested-item1
  - nested-item2
  - nested-key1: 100

YAML 1.2 spec support status

Implicit primitive type conversion of scalar

The following is the default implicit type interpretation.

Basically, it follows YAML Core Schema. https://yaml.org/spec/1.2.2/#103-core-schema

Support Regular expression Resolved to type
:white_check_mark: null \| Null \| NULL \| ~ null
:white_check_mark: /* Empty */ null
:white_check_mark: true \| True \| TRUE \| false \| False \| FALSE boolean
:white_check_mark: [-+]? [0-9]+ int (Base 10)
:white_check_mark: 0o [0-7]+ int (Base 8)
:white_check_mark: 0x [0-9a-fA-F]+ int (Base 16)
:white_check_mark: [-+]? ( \. [0-9]+ \| [0-9]+ ( \. [0-9]* )? ) ( [eE] [-+]? [0-9]+ )? float
:white_check_mark: [-+]? ( \.inf \| \.Inf \| \.INF ) float (Infinity)
:white_check_mark: \.nan \| \.NaN \| \.NAN float (Not a number)

https://yaml.org/spec/1.2.2/

Following is the results of the test for the examples from the yaml spec page.

Credits

VYaml is inspired by:

Aurhor

@hadashiA

License

MIT