jefffhaynes / BinarySerializer

A declarative serialization framework for controlling formatting of data at the byte and bit level using field bindings, converters, and code.
MIT License
290 stars 62 forks source link

FieldCount with a computed ignored property is filling the array with invalid element count #216

Open sn4k3 opened 1 year ago

sn4k3 commented 1 year ago
public class Preview
    {
        /// <summary>
        /// Gets the image width, in pixels.
        /// </summary>
        [FieldOrder(0)] public uint ResolutionX { get; set; } = 224;

        /// <summary>
        /// Gets the operation mark 'x'
        /// </summary>
        [FieldOrder(1)] [FieldLength(4)] [SerializeAs(SerializedType.TerminatedString)] public string Mark { get; set; } = "x";

        /// <summary>
        /// Gets the image height, in pixels.
        /// </summary>
        [FieldOrder(2)] public uint ResolutionY { get; set; } = 168;

        [Ignore] public uint DataSize => ResolutionX * ResolutionY * 2;

        [FieldOrder(3)] [FieldCount(nameof(DataSize))] public byte[] Data { get; set; } = Array.Empty<byte>();
}

Deserializing that results in:

image

Note that outrageous Data[10730659] instead of 75264

bevanweiss commented 1 year ago

Can you please put this together into a short form test case showing the actual source data (the serialized form) that is runnable and reproducible for someone else?

There's inconsistencies with your screenshot. Your code shows a class with members:

And yet your debug view from supposedly this same class shows members of:

So it's hard to tell if this is you providing the wrong source data, or if there actually is something going on with the BinarySerializer library.

You should be able to easily put together a Test Case in line with this: https://github.com/jefffhaynes/BinarySerializer/blob/732c2c412253490645c48350ebb4b4c7f4e97a49/BinarySerializer.Test/SerializeAs/SerializeAsTest.cs#LL29C9-L29C9

With your source data, your expected output data, and then it should fail. Then anyone else looking to investigate could just run your test, see the failure, trace it into the BinarySerializer code and fix it.

sn4k3 commented 1 year ago

I have stripped the class to remove noise and make easy to see the actual problem. I never had a [Ignore] binded property to work correctly. In the past I have post about it here but I used an overcome. This time I also use a overcome, I manually read the byte array after preview deserialization, but the main issue continues. The strange fact is that library is not new, so I wonder how other people never? had problem with that kind of bind usage?

Here a real and complete example:

using BinarySerialization;

namespace IgnoreBind;

public sealed class Preview
{
    /// <summary>
    /// Gets the section name placeholder
    /// </summary>
    [FieldOrder(0)][FieldLength(12)][SerializeAs(SerializedType.TerminatedString)] public string TableName { get; set; }

    /// <summary>
    /// Gets the length of this section
    /// </summary>
    [FieldOrder(1)] public uint TableLength { get; set; }

    /// <summary>
    /// Gets the image width, in pixels.
    /// </summary>
    [FieldOrder(2)] public uint ResolutionX { get; set; } = 224;

    /// <summary>
    /// Gets the operation mark 'x'
    /// </summary>
    [FieldOrder(3)][FieldLength(4)][SerializeAs(SerializedType.TerminatedString)] public string Mark { get; set; } = "x";

    /// <summary>
    /// Gets the image height, in pixels.
    /// </summary>
    [FieldOrder(4)] public uint ResolutionY { get; set; } = 168;

    [Ignore] public uint DataSize => ResolutionX * ResolutionY * 2;

    [FieldOrder(5)][FieldCount(nameof(DataSize))] public byte[] Data { get; set; } = null!;

    public override string ToString()
    {
        return $"{nameof(TableName)}: {TableName}, {nameof(TableLength)}: {TableLength}, {nameof(ResolutionX)}: {ResolutionX}, {nameof(Mark)}: {Mark}, {nameof(ResolutionY)}: {ResolutionY}, {nameof(DataSize)}: {DataSize}, {nameof(Data)}: {Data.Length}";
    }
}

sealed class Program
{
    static void Main(string[] args)
    {
        var serializer = new BinarySerializer { Endianness = Endianness.Little };
        using var inputFile = new FileStream("testfile.px6s", FileMode.Open, FileAccess.Read);
        inputFile.Seek(28, SeekOrigin.Begin); // Go to preview address value

        var address = serializer.Deserialize<uint>(inputFile);
        inputFile.Seek(address, SeekOrigin.Begin); // Go to preview address

        var preview = serializer.Deserialize<Preview>(inputFile);

        Console.WriteLine(preview);

        if (preview.Data.Length != preview.DataSize)
        {
            Console.WriteLine($"ERROR: Expecting data to be {preview.DataSize} bytes, but got {preview.Data.Length} bytes!");
        }
    }
}

image

Project sample: IgnoreBind.zip with test file

bevanweiss commented 1 year ago

Works for me...

namespace BinarySerialization.Test.Issues.Issue216
{
    public class Preview
    {
        /// <summary>
        /// Gets the image width, in pixels.
        /// </summary>
        [FieldOrder(0)]
        public uint ResolutionX { get; set; }

        /// <summary>
        /// Gets the image height, in pixels.
        /// </summary>
        [FieldOrder(2)]
        public uint ResolutionY { get; set; }

        [Ignore]
        public uint DataSize => ResolutionX * ResolutionY * 2;

        [FieldOrder(3)]
        [FieldCount(nameof(DataSize), BindingMode = BindingMode.OneWay)]
        public byte[] Data { get; set; }
    }
}

using Microsoft.VisualStudio.TestTools.UnitTesting;

namespace BinarySerialization.Test.Issues.Issue216
{
    [TestClass]
    public class Issue216Tests : TestBase
    {
        [TestMethod]
        public void TestIssue216()
        {
            var expected = new Preview
            {
                ResolutionX = 2,
                ResolutionY = 3,
                Data = new byte[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 },
            };

            var actual = Roundtrip(expected, new byte[] { 2, 0, 0, 0, 3, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 });
            Assert.AreEqual(expected.ResolutionX, actual.ResolutionX);
            Assert.AreEqual(expected.ResolutionY, actual.ResolutionY);
            Assert.AreEqual(expected.Data.Length, actual.Data.Length);
            for (int i = 0; i < expected.Data.Length; i++)
                Assert.AreEqual(expected.Data[i], actual.Data[i]);
        }
    }
}

image

sn4k3 commented 1 year ago

There is a bad habit to blind trust tests which are written by us, and that means a chance of the test written incorrectly. And so, I think it's the case where your test is not valid. You creating a MemoryStream that is exactly the size of object you testing and ending in its Data. If the library wrongly get Data length with the Ignored field but then sanitize to stream remaining size it's normal to work as expected... To be a valid and complete test must include one or more fields after Data, otherwise you potential left a hole in test leading to unexpected results and fool you with a fake pass. IMO these tests should never be the size of object, instead it should be an object in between of a Stream to validate these bounds and other kind of unexpected problems.

Anyway, we don't need a test, let's look at evidences:

File total size  = 90610 bytes
Returned Data    = 90418 bytes
90610 - 90418    = 192 (Difference)
Preview address starts at 164, + 12 + 4 * 4 = 192 (This is where Data starts)
192 + 90418 = 90610 (The total file size!)

This means the Data is getting a FieldCount of total file size - current position And that's why your test pass, no more data after the actual problem and then the size of Data is correct.

To prove my theory, I just trim file to end at the preview data. This way it works now as expected (Because file ends at preview.Data):

image

I don't know what is happening under the library, but for sure it don't like this use case with [Ignore] field. I've reported this before (more than 2 years ago), so it's not a new update problem. Stranger is the fact "no one" is step into this problem because it's a very simple and common usage of a binding, where you need to compute a length that is not directly encoded into the Stream.

bevanweiss commented 1 year ago

If you think that the unit test I've written does not detail your problem, you are free to re-write it so that it does show your problem.

sn4k3 commented 1 year ago

In order to see your test fail, just add: [FieldOrder(4)] public uint AnotherField { get; set; } after the Data. I even expect that to throw an exception as AnotherField will now attempt to read outside the stream.

    public class Preview
    {
        /// <summary>
        /// Gets the image width, in pixels.
        /// </summary>
        [FieldOrder(0)]
        public uint ResolutionX { get; set; }

        /// <summary>
        /// Gets the image height, in pixels.
        /// </summary>
        [FieldOrder(2)]
        public uint ResolutionY { get; set; }

        [Ignore]
        public uint DataSize => ResolutionX * ResolutionY * 2;

        [FieldOrder(3)]
        [FieldCount(nameof(DataSize), BindingMode = BindingMode.OneWay)]
        public byte[] Data { get; set; }

        [FieldOrder(4)] public uint AnotherField { get; set; }
    }
bevanweiss commented 1 year ago

@sn4k3 are you able to check the PR that I created that should address the more complex getter situation you were using on the [Ignore] field? I can't think of any drawback to it at this stage (because of the null check). I actually think that it should be 'more aggressive' and should actually execute the ValueGetter always in this situation (not just when the current value is null). But I'd like to wait for @jefffhaynes to comment in regards to this. He potentially has a cleaner way to address this anyway.

jefffhaynes commented 1 year ago

Going to try to look at this tomorrow, thanks.

jefffhaynes commented 1 year ago

My inclination is to say that this is "as designed". The way the serializer works is that it creates a parallel object tree and serializes that rather than the value itself. There is currently no attempt to preserve internal logic from the original object as this would create a deep rabbit hole. While some very basic things could be possible, it feels like a slippery slope with an ill-defined bottom (e.g. what about stateful things? who's state?).

These are the moments when you realize that we are simply using the language as a scaffolding for serialization and ignoring the parts that may not make as much sense. To that end, I've often thought that BinarySerializer should be probably be based on json or some such thing to remove the temptation to go beyond basic declaration. However, the ability to use POCO has proven too much for me to give up.

I'm open to a discussion on this if anyone has any clever ideas but I do think the limit and scope of such a capability would need to be well-defined and obvious to the consumer. I'll admit that it may not currently be obvious as it may seem intuitive that basic calculated fields would be supported.

sn4k3 commented 1 year ago

I think there should be an easy way to compute a value and use it together with the serializer. Create a converter or implement custom serializer just to achieve the same as we can with a simple computed property is in some way painful.

The worst part is the library allow that (No throw), leading to unexpected results. Myself I've been try that for years since I use the library, and took me very long to realize the problem. I also looked at issues because there's no way only me want to use such, I mean this is the most common and logical way to obtain that, and found this issue, however it looks like it works for that guy, then I thought: "I'm crazy? How it doesn't work for me!? Is my code wrong? What I'm doing wrong?", but none of my attempts work, that lead me to open the issue two years ago and now with more details.

I understand that using fields outside the scope of serializer can lead to "holes", but then we must make sure it's well documented and it throw proper error in order to avoid what happened to me. Like when serializer found a [Ignore] field, throw an exception telling [Ignored] fields are not allowed to be used in its scope.

Still, I think we need that easy way to achieve this, so, if ignored fields aren't the way, what about add a new attribute like ([FieldComputed] or [FieldVirtual]) that the only purpose is to calculate a value? That way developer marked at porpouse and acknowledge that field is used for read-only, not written nor read from file and only serves the porpouse of calculating a required value for an attribute. We can also extend that attribute and allow it to be set, if a public setter is available.

Example:

public class Preview
    {
        /// <summary>
        /// Gets the image width, in pixels.
        /// </summary>
        [FieldOrder(0)] public uint ResolutionX { get; set; } = 224;

        /// <summary>
        /// Gets the operation mark 'x'
        /// </summary>
        [FieldOrder(1)] [FieldLength(4)] [SerializeAs(SerializedType.TerminatedString)] public string Mark { get; set; } = "x";

        /// <summary>
        /// Gets the image height, in pixels.
        /// </summary>
        [FieldOrder(2)] public uint ResolutionY { get; set; } = 168;

        [FieldVirtual] public uint DataSize => ResolutionX * ResolutionY * 2;

        [FieldOrder(3)] [FieldCount(nameof(DataSize))] public byte[] Data { get; set; } = Array.Empty<byte>();
}
bevanweiss commented 1 year ago

@jefffhaynes I’m unsure if there’s an easy way to determine if a getter / setter is ‘too complex’ in which case an exception could be thrown to surface the issue, or perhaps the serialiser/deserialiser result could return a tuple with the value and a list of warnings from the serialisation/deserialisation (things that aren’t bad enough to warrant no result, but might suggest the operation wasn’t exactly as intended).

I do think this [Ignore] calculated field along with the serialised equivalent make sense as used in Issue216. Short of custom deserialization, or expanding the TypeConverters to accept multiple parameters (which itself might be an idea… like a Lambda capture… but that’s got similar issues with when it should be executed/cached and how internal state should be).

Maybe the big problems can just be left for now. And just executing the ValueGetter where present would suffice for now. It doesn’t break Test Cases, so the ‘as test guaranteed’ behaviour is not impacted. And it does seem to match the ‘least surprises’ ethos. C# allows such a getter, the library ‘allows’ it to be used as an input to the Binding. So the least surprising thing would be that when the Binding is used, it uses the value from the getter.

I disagree with adding extra terminology. [Ignore] is for if the value isn’t to be serialised/deserialised. That is the intent from the example. Q.E.D. No need for more terms.

jefffhaynes commented 1 year ago

Ok agree we should do something. Let me think about it.