rpgmaker / NetJSON

Faster than Any Binary? Benchmark: http://theburningmonk.com/2014/08/json-serializers-benchmarks-updated-2/
MIT License
230 stars 29 forks source link

Fixed #184 #187

Closed wmjordan closed 5 years ago

wmjordan commented 6 years ago

Avoided ptr+index addressing; Fixed invalid strings (ended with ); Fixed #184

rpgmaker commented 6 years ago

Thanks, I will review it this afternoon

wmjordan commented 6 years ago

Checking offset in DecodeJSONString was unnecessary since four-digit hex number will never reach 0x10000.

rpgmaker commented 6 years ago

I will try to merge it this weekend. Thanks for the contribution

rpgmaker commented 6 years ago

Looking at the code a second time. Would it possible to use index for navigating rather than increment the pointer itself? Reason is most of the code is already based on index navigation which allows for things such as looking backward on a string and etc.

Understand that all the test are currently passing. But i would like to prevent any possible bug that might come from the other piece of code doing something like below

// Increment pointer ptr++ //Update Index index = (ptr - sPtr) // Other code navigating based on index which will result in reading null character

Example screenshot:

image

wmjordan commented 6 years ago

The ptr[index] way and incremented pointer, I think, theoretically can potentially get out of the bound of the string. Using incremented pointer may gain a possible performance boost by substitute add calculations with simpler increments, and therefore allow hardware to prefetch data. To look backward, we can still use ptr[-offset]. Every time we increment ptr, we check against '\0', except the following part I did not quite understand what it was for.

if (current == settings._quoteChar) {
    hasQuote = true;
} else if (current == 'n') {
    ptr += 3;
    index += (int)(ptr - startPtr);
    return null;
}

Concerning the possible problems, we may actually need more unit tests.

rpgmaker commented 6 years ago

if (current == settings._quoteChar) { hasQuote = true; } else if (current == 'n') { ptr += 3; index += (int)(ptr - startPtr); return null; }

the first part of the "if" is for checking when we encounter a quote since there could a json such as this {"Name": "Test"}

While the second is for null checks

rpgmaker commented 6 years ago

The reason the check for quoteChar also exists is because i support both single and double quotes.

wmjordan commented 6 years ago

Understood. The second part of the if are for null or some else tokens that begin with 'n', e.g. nabc, nxyz.

rpgmaker commented 6 years ago

The second case will always be a null else you have an invalid json and at the point you would have already got an exception with InvalidJSONException. At least that is the expectation.

example {"Name": null}

invalid (Will fail) {"Name": nuun}

wmjordan commented 6 years ago

I got the following in the C# interactive window

> NetJSON.NetJSON.Deserialize<Tuple<string, string>>("{\"Item1\": n,\"Item2\": \"abc\"}")
[(, })]
> NetJSON.NetJSON.Deserialize<Tuple<string, string>>("{\"Item1\": nb,\"Item2\": \"abc\"}")
[(, abc)]
> NetJSON.NetJSON.Deserialize<Tuple<string, string>>("{\"Item1\": null,Item2\": \"abc\"}")
[(, })]
> NetJSON.NetJSON.Deserialize<Tuple<string, string>>("{\"Item1\": null, Item2\": \"abc\"}")
[(, })]
> NetJSON.NetJSON.Deserialize<Tuple<string, string>>("{\"Item1\": null,  Item2\": \"abc\"}")
[(, })]
> NetJSON.NetJSON.Deserialize<Tuple<string, string>>("{\"Item1\": nul,Item2\": \"abc\"}")
[(, })]
> NetJSON.NetJSON.Deserialize<Tuple<string, string>>("{\"Item1\": nul,\"Item2\": \"abc\"}")
[(, abc)]
rpgmaker commented 6 years ago

Nice. They did not throw any error. The property names does not have all the quotes in place.

rpgmaker commented 6 years ago

Any update on the changes?

wmjordan commented 6 years ago

Currently no update. :)

wmjordan commented 6 years ago

Reason is most of the code is already based on index navigation which allows for things such as looking backward on a string and etc.

To look backward with pointer, we can use *(ptr - n), where n is a positive number and the code retrieves the nth character before ptr.

rpgmaker commented 6 years ago

The original pointer is never incremented. Thanks for update 😀

rpgmaker commented 6 years ago

Any updates yet? I could help convert your fix to use index instead if needed.

Thanks,

wmjordan commented 6 years ago

I did not meet with any problem at this moment. Maybe it is safe to merge?

rpgmaker commented 6 years ago

Let me do some testing then I can merge it

wmjordan commented 6 years ago

Using direct incrementing pointer addressing, rather than "ptr+index", can help improve performance. The original pointer which is passed into DecodeJSONString would not be affected by what we have done inside the method.

rpgmaker commented 6 years ago

Arithmetic operation on pointer is likely to be fast assuming it is not optimized by the compiler. I will see if I can test it tomorrow and then merge I suppose.

Thanks,

wmjordan commented 6 years ago

About the performance of pointer addressing, I previously made a series of benchmarks on various methods of memory comparison with Benchmark.net. I'd love to share my code with you here. Here's one of the benchmark results I got with my computer. I was surprised that array addressing was not slower than pointer addressing at all.

                            Method |     Mean |     Error |    StdDev |
---------------------------------- |---------:|----------:|----------:|
           TestWithArrayAddressing | 19.45 ns | 0.1515 ns | 0.1183 ns |
                   TestWithPInvoke | 21.68 ns | 0.0904 ns | 0.0706 ns |
             TestWithUnsafeCompare | 28.09 ns | 0.5315 ns | 0.4712 ns |
 TestWithUnsafeIncrementPtrCompare | 20.81 ns | 0.1285 ns | 0.1073 ns |
      TestWithUnsafeLongPtrCompare | 11.09 ns | 0.2632 ns | 0.3939 ns |
public class CompareArrayTests
{
    static readonly byte[] _A = new byte[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 9, 8, 7, 6, 5, 4, 3, 2, 1 };
    static readonly byte[] _B = new byte[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 9, 8, 7, 6, 5, 4, 3, 2, 1 };

    [Benchmark]
    public void TestWithArrayAddressing() {
        AreEqualArray(_A, _B);
    }
    [Benchmark]
    public void TestWithPInvoke() {
        AreEqualPInvoke(_A, _B);
    }
    [Benchmark]
    public void TestWithUnsafeCompare() {
        AreEqualPtr(_A, _B);
    }
    [Benchmark]
    public void TestWithUnsafeIncrementPtrCompare() {
        AreEqualIncrementPtr(_A, _B);
    }
    [Benchmark]
    public void TestWithUnsafeLongPtrCompare() {
        AreEqualLongPtr(_A, _B);
    }

    static bool AreEqualArray(byte[] source, byte[] target) {
        if (ReferenceEquals(source, target)) {
            return true;
        }
        if (source == null || target == null) {
            return false;
        }
        var l = source.Length;
        if (l != target.Length) {
            return false;
        }
        for (int i = 0; i < source.Length && i < target.Length; i++) {
            if (source[i] != target[i]) {
                return false;
            }
        }
        return true;
    }
    static bool AreEqualPInvoke(byte[] source, byte[] target) {
        if (ReferenceEquals(source, target)) {
            return true;
        }
        if (source == null || target == null) {
            return false;
        }
        var l = source.Length;
        if (l != target.Length) {
            return false;
        }
        return memcmp(source, target, l) == 0;
    }
    static unsafe bool AreEqualPtr(byte[] source, byte[] target) {
        if (ReferenceEquals(source, target)) {
            return true;
        }
        if (source == null || target == null) {
            return false;
        }
        var l = source.Length;
        if (l != target.Length) {
            return false;
        }

        fixed (byte* a = source)
        fixed (byte* b = target) {
            while (--l >= 0) {
                if (*(a + l) != *(b + l)) {
                    return false;
                }
            }
        }
        return true;
    }
    static unsafe bool AreEqualIncrementPtr(byte[] source, byte[] target) {
        if (ReferenceEquals(source, target)) {
            return true;
        }
        if (source == null || target == null) {
            return false;
        }
        var l = source.Length;
        if (l != target.Length) {
            return false;
        }

        fixed (byte* a = source)
        fixed (byte* b = target) {
            var pa = a;
            var pb = b;
            var end = a + l;
            do {
                if (*pa != *pb) {
                    return false;
                }
                pb++;
            } while ((++pa) < end);
        }
        return true;
    }
    static unsafe bool AreEqualLongPtr(byte[] source, byte[] target) {
        if (ReferenceEquals(source, target)) {
            return true;
        }
        if (source == null || target == null) {
            return false;
        }
        var l = source.Length;
        if (l != target.Length) {
            return false;
        }

        fixed (byte* a = source)
        fixed (byte* b = target) {
            var pa = (long*)a;
            var pb = (long*)b;
            var end = l >> 3;

            while (--end >= 0) {
                if (*pa != *pb) {
                    return false;
                }
                pa++; pb++;
            }
            var ba = (byte*)pa;
            var bb = (byte*)pb;
            if ((l & 4) != 0) {
                if (*(int*)pa != *(int*)pb) {
                    return false;
                }
                ba += 4;
                bb += 4;
            }

            if ((l & 2) != 0) {
                if (*(short*)ba != *(short*)bb) {
                    return false;
                }
                ba += 2;
                bb += 2;
            }

            if ((l & 1) != 0 && *ba != *bb) {
                return false;
            }
            return true;
        }
    }

    [DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl)]
    private static extern int memcmp(byte[] b1, byte[] b2, long count);
}
rpgmaker commented 6 years ago

Interesting result. There is skip security check that you can add to the dllimport to speed it up a little. I will look for the attribute

rpgmaker commented 6 years ago

It is called. [SuppressUnmanagedCodeSecurity]

wmjordan commented 6 years ago

Thank you very much for the hint. Here was my result after applying the [SuppressUnmanagedCodeSecurity] attribute, the P/Invoke method was accelerated.

                            Method |     Mean |     Error |    StdDev |
---------------------------------- |---------:|----------:|----------:|
           TestWithArrayAddressing | 19.60 ns | 0.1137 ns | 0.0949 ns |
                   TestWithPInvoke | 16.81 ns | 0.0896 ns | 0.0748 ns |
             TestWithUnsafeCompare | 28.25 ns | 0.6102 ns | 0.5095 ns |
 TestWithUnsafeIncrementPtrCompare | 20.20 ns | 0.2037 ns | 0.1701 ns |
      TestWithUnsafeLongPtrCompare | 10.92 ns | 0.0761 ns | 0.0636 ns |
rpgmaker commented 6 years ago

Cool. Glad it made a difference. Even so, there is still some cost for using dllimport. Another way to improve this is by using Marshall unsafe function delegate. It should gain even more. https://blogs.msdn.microsoft.com/jonathanswift/2006/10/03/dynamically-calling-an-unmanaged-dll-from-net-c/

wmjordan commented 6 years ago

I tried that. But it brought undesired delay. I used the following code to setup the delegate and substituted the P/Invoke with that delegate.

static readonly MemCmp _MemCmp = Marshal.GetDelegateForFunctionPointer<MemCmp>
    (GetProcAddress(LoadLibrary("msvcrt.dll"), "memcmp"));
delegate int MemCmp(byte[] a, byte[] b, long length);

Here was the result. Not so good :)

                            Method |     Mean |     Error |    StdDev |
---------------------------------- |---------:|----------:|----------:|
           TestWithArrayAddressing | 19.97 ns | 0.0940 ns | 0.0734 ns |
                   TestWithPInvoke | 41.44 ns | 0.4313 ns | 0.3367 ns |
             TestWithUnsafeCompare | 28.53 ns | 0.2906 ns | 0.2426 ns |
 TestWithUnsafeIncrementPtrCompare | 20.07 ns | 0.3984 ns | 0.3327 ns |
      TestWithUnsafeLongPtrCompare | 11.00 ns | 0.0845 ns | 0.0660 ns |
rpgmaker commented 6 years ago

Did you add this attribute [UnmanagedFunctionPointer(CallingConvention.Cdecl)] to the delegate definition?

wmjordan commented 6 years ago

I'd forgotten that. I added that attribute to the delegate and performed the benchmark again. The result was almost the same. I thought the link you provided was not for DLL interop acceleration, but for the flexibility for dynamically calling functions where the DLL names or function names could not be determined at compilation.

rpgmaker commented 6 years ago

Good to know. Thanks

wmjordan commented 5 years ago

Rebase only.

wmjordan commented 5 years ago

My repository was somewhat behind yours. Thus I got it synced.

rpgmaker commented 5 years ago

I see. I guess It make sense.

wmjordan commented 5 years ago

It has been quite a few months. I guessed that the original issue should had been fixed.

rpgmaker commented 5 years ago

Yeah. The invalid string had been resolved if that is what you are referring to.

Thanks,