jefffhaynes / BinarySerializer

A declarative serialization framework for controlling formatting of data at the byte and bit level using field bindings, converters, and code.
MIT License
292 stars 62 forks source link

Field is set to the wrong value after serialize to file #171

Open sn4k3 opened 3 years ago

sn4k3 commented 3 years ago

I'm having a problem where i set a field to be 6 and on file is written with 5 Here's the defenition:

            /// <summary>
            /// Gets the size of the printer model
            /// </summary>
            [FieldOrder(3)]
            [FieldEndianness(Endianness.Big)]
            public uint PrinterModelSize { get; set; } = 6;

            /// <summary>
            /// Gets the printer model
            /// </summary>
            [FieldOrder(4)]
            [FieldLength(nameof(PrinterModelSize))]
            [SerializeAs(SerializedType.TerminatedString)]
            public string PrinterModel
            {
                get => _printerModel;
                set
                {
                    _printerModel = value;
                    PrinterModelSize = string.IsNullOrEmpty(value) ? 0 : (uint)value.Length+1;
                }
            }

I place a breakpoint on both PrinterModelSize = string.IsNullOrEmpty(value) ? 0 : (uint)value.Length+1; and public uint PrinterModelSize { get; set; } = 6; Both shows 6 with a value of "CL-89" which is correct since it have a null terminated char (00) When the file got written that field turn into 5. Does it redefine the value when serializing due the link with [FieldLength(nameof(PrinterModelSize))] ? If so is there a way to overcome this?

image

EDIT 1: I added to attribute [FieldLength(nameof(PrinterModelSize), BindingMode = BindingMode.OneWay)], that way the size become 6 but string still not null terminated (no zero appended)

image

jefffhaynes commented 3 years ago

The serializer does not support non-trivial getters and setters so the body of your setter will not be used during deserialization. Maybe that's the problem?

sn4k3 commented 3 years ago

But the TerminatedString flag should append a 00 byte at serialization time right?

So far i have this working by using a byte array instead:

            /// <summary>
            /// Gets the size of the printer model
            /// </summary>
            [FieldOrder(3)]
            [FieldEndianness(Endianness.Big)]
            public uint PrinterModelSize { get; set; } = 6;

            [FieldOrder(4)]
            [FieldLength(nameof(PrinterModelSize))]
            public byte[] PrinterModelArray { get; set; }  = { 0x43, 0x4C, 0x2D, 0x38, 0x39, 0x0 };

            [Ignore]
            public string PrinterModel
            {
                get => Encoding.ASCII.GetString(PrinterModelArray).TrimEnd(char.MinValue);
                set
                {
                    PrinterModelArray = Encoding.ASCII.GetBytes(value + char.MinValue);
                    PrinterModelSize = (uint) PrinterModelArray.Length;
                }
            }
hoshinokanade commented 3 years ago

But the TerminatedString flag should append a 00 byte at serialization time right?

So far i have this working by using a byte array instead:

            /// <summary>
            /// Gets the size of the printer model
            /// </summary>
            [FieldOrder(3)]
            [FieldEndianness(Endianness.Big)]
            public uint PrinterModelSize { get; set; } = 6;

            [FieldOrder(4)]
            [FieldLength(nameof(PrinterModelSize))]
            public byte[] PrinterModelArray { get; set; }  = { 0x43, 0x4C, 0x2D, 0x38, 0x39, 0x0 };

            [Ignore]
            public string PrinterModel
            {
                get => Encoding.ASCII.GetString(PrinterModelArray).TrimEnd(char.MinValue);
                set
                {
                    PrinterModelArray = Encoding.ASCII.GetBytes(value + char.MinValue);
                    PrinterModelSize = (uint) PrinterModelArray.Length;
                }
            }

I have a similar usage case that a string is null-terminated at the same time prefixed with FieldLength. I looked up at the source code it seems when FieldLength is there, then null-termination won't apply. Similarly I have resorted to using a custom class.

sn4k3 commented 3 years ago

Similarly I have resorted to using a custom class.

Can you share your custom class to fix this?

hoshinokanade commented 3 years ago

Mine is not a lot better than yours indeed.

public class MyString
{
    [FieldOrder(0)]
    public byte SerializedLength { get; set;}

    [FieldOrder(1)]
    [FieldLength(nameof(SerializedLength))]
    public string SerializedValue { get; set; }

    [Ignore]
    public string Value
    {
        get => SerializedValue[0..^1];
        set => SerializedValue = value + char.MinValue;
    }
}

One concern is that you would always want to interact via Value property instead of SerializedValue (while it is necessary to keep that public, unfortunately). This shares the same downside with your class.

I didn't do benchmark on which is more performant. The main reason of why I didn't widely use it in my codebase is that this requires to new the MyString class everywhere. I say I can easily make use of code generator to assist me defining my binary classes but I yet to come up effective solution to seamlessly use ordinary string.

sn4k3 commented 3 years ago

Ah, i tought you created some sort of custom flag attribute to deal with this strings, for example: [FieldStringNullTerminated] Nevertheless you definition look simpler than my 👍

hoshinokanade commented 3 years ago

All these adds up to more level of indirections in order to more seamlessly to use primitives:

namespace MyTest
{
    public class MyTestClass
    {
        [FieldOrder(0)]
        public MyString Name { get; set; }

        [FieldOrder(1)]
        public MyList<MyString> Foods { get; set; }

        public static MyTestClass Make(string name, IReadOnlyList<string> foods)
        {
            // factory class to make my life easier
        }

        public void Deconstruct(out string name, out IReadOnlyList<string> foods)
        {
            // destructuring to save me from conversions
        }
    }
}

Usage:

        public async void TestSerialization()
        {
            var stream = new MemoryStream();
            var serializer = new BinarySerializer();
            var obj = MyTestClass.Make // with primitives
            (
                name: "ABC",
                foods: new List<string>
                {
                     "A" ,
                     "B" ,
                     "C" ,
                }
            );

            await serializer.SerializeAsync(stream, obj);
            var output = stream.ToArray();
            stream.Seek(0, SeekOrigin.Begin);

            var deserializedObj = await serializer.DeserializeAsync<MyTestClass>(stream);
            var (name, foods) = deserializedObj; // string, IReadOnlyList<string>, great!
        }

Along with that it is also less likely for the remaining part of code to accidentally use the SerializedValue in the MyString class above. Note that this method doesn't scale well when you have, say, tens of fields!

jefffhaynes commented 3 years ago

Sorry, just getting back to this. Have either of you tried a ValueConverter to solve this?

sn4k3 commented 2 years ago

Sorry, just getting back to this. Have either of you tried a ValueConverter to solve this?

Today i step again with a field of this type, so i tried to use a ValueConverter on FieldLength, it output the right length to the file field but the string would not respect the size and not output a extra 00 byte in the end of the string. Maybe i'm doing this wrong but i can't find a way to manipulate the string value on serialization time

Class:


    public class NullTerminatedLengthConverter : IValueConverter
    {
        // Read
        public object Convert(object value, object converterParameter, BinarySerializationContext context)
        {
            return value;
        }

        // Write
        public object ConvertBack(object value, object converterParameter, BinarySerializationContext context)
        {
            return System.Convert.ToUInt32(value) + 1;
        }
    }

        public sealed class SlicerInfoV3
        {
            [FieldOrder(0)] [FieldEndianness(Endianness.Big)] public uint SoftwareNameSize { get; set; } = (uint)About.SoftwareWithVersion.Length + 1;

            [FieldOrder(1)]
            [FieldLength(nameof(SoftwareNameSize), ConverterType = typeof(NullTerminatedLengthConverter))]
            [SerializeAs(SerializedType.TerminatedString)]
            public string SoftwareName { get; set; } = About.SoftwareWithVersion;

            [FieldOrder(2)] [FieldEndianness(Endianness.Big)] public uint MaterialNameSize { get; set; }

            [FieldOrder(3)] 
            [FieldLength(nameof(MaterialNameSize), ConverterType = typeof(NullTerminatedLengthConverter))] 
            [SerializeAs(SerializedType.TerminatedString)]
            public string MaterialName { get; set; }

            [FieldOrder(4)] public uint Unknown1 { get; set; }
            [FieldOrder(5)] public uint Unknown2 { get; set; }
            [FieldOrder(6)] public uint Unknown3 { get; set; }
            [FieldOrder(7)] public uint Unknown4 { get; set; }
            [FieldOrder(8)] public byte Unknown5 { get; set; } = 1;
            [FieldOrder(9)] public byte LightPWM { get; set; } = byte.MaxValue;
            [FieldOrder(10)] public ushort Unknown6 { get; set; } = 2;
            [FieldOrder(11)] public PageBreak PageBreak { get; set; } = new();
        }

010Editor_2022-01-04_04-40-00

Note that the SoftwareNameSize is correct now (16), and text is 15 length but there is a missing 0x00 byte on the end of the string. So i get 00 00 00 1A but i should get 00 00 00 00 1A just after the string

We need a simple way to keep this null terminated strings intact on both read and write time... Something like a [SerializeAs(SerializedType.TerminatedStringAppendZero)] ?

bevanweiss commented 1 year ago

Have you tried this as below?

            /// <summary>
            /// Gets the size of the printer model
            /// </summary>
            [FieldOrder(3)]
            [FieldEndianness(Endianness.Big)]
            public uint PrinterModelSize;

            /// <summary>
            /// Gets the printer model
            /// </summary>
            [FieldOrder(4)]
            [FieldLength(nameof(PrinterModelSize))]
            [SerializeAs(SerializedType.TerminatedString)]
            public string? PrinterModel;

I think the nullability of the PrinterModel is important here, to provide the PrinterModelSize=0 option. Otherwise if it's a terminated string and it exists at all, then this is a minimum of 1 byte right (the '\0'). The only way to get rid of it is to say that the PrinterModel string may not exist at all, when the PrinterModelSize is 0.

It's possible that this won't actually work. But I'd argue in this situation it would be a bug in the BinarySerializer. Since a null field MUST have a length of 0, nothing else makes sense for it (unless one wants to get truly abstract, and argue that ALL values are valid... since the length of something that doesn't exist is arbitrary).

sn4k3 commented 1 year ago

@bevanweiss the main problem is that SerializedType.TerminatedString is not appending a \0 at end of string while serializing. Example:

test\0 (length=5) Deserialize ok to "test" But when serialize back it is written as: test (length=4) without appending a null termination '\0'

I solved that by implementing a custom class NullTerminatedUintStringBigEndian which also uses nullable value

bevanweiss commented 1 year ago

Ahh yes, I think I do recall this when I was looking into the Graph logic https://github.com/jefffhaynes/BinarySerializer/blob/fa8181145d89c19d5f3d5070edd793407f6b1917/BinarySerializer/Graph/TypeGraph/TypeNode.cs#LL389-L403

I wasn't a fan of how it did this. I personally think if it's a TerminatedString it should remain as this, but if a length constraint is applied then it should be NULL padded ('\0') to this length. If the string length (plus termination char) is greater than the length, it should be truncated with the last character being the termination. i.e. if configured as:

[FieldLength(4)]
[SerializeAs(SerializeType.TerminatedString]]
public string LongString = "This string here";

it should result in a Serialized output of 'T', 'h', 'i', '\0'

Likewise if it's

[FieldLength(4)]
[SerializeAs(SerializeType.TerminatedString]]
public string LongString = "T";

it should result in a Serialized output of 'T', '\0', '\0', '\0'

@jefffhaynes Is there a particular reason for the current behaviour (where FieldLength removed the TerminatedString type)? My thoughts are that the TerminatedString characteristic should remain. But it should just gain a 'size' parameter which is used for truncation / padding. Would you accept a Pull Request to: a. Add tests for this b. Implement the revised behaviour mentioned? (this would change the current serialize/deserialize behaviour of potentially existing logic however... so unsure if there's any backward compatibility flag that might be desired).

jefffhaynes commented 1 year ago

I think I agree with your reasoning here. Probably just an oversight on my part when I first wrote it. Changing it would be a breaking change but probably not a significant one? 🤞

bevanweiss commented 1 year ago

I think I agree with your reasoning here. Probably just an oversight on my part when I first wrote it. Changing it would be a breaking change but probably not a significant one? 🤞

I'll put together a PR for this (maybe this week.. maybe into next week). I think there would be a pretty straight forward way to prevent it being a breaking change by applying some additional optional argument on the FieldLength attribute (RetainStringTermination?) that would default to false (current behaviour).