Closed wmjordan closed 5 years ago
Thanks, I will review it this afternoon
Checking offset
in DecodeJSONString
was unnecessary since four-digit hex number will never reach 0x10000.
I will try to merge it this weekend. Thanks for the contribution
Looking at the code a second time. Would it possible to use index for navigating rather than increment the pointer itself? Reason is most of the code is already based on index navigation which allows for things such as looking backward on a string and etc.
Understand that all the test are currently passing. But i would like to prevent any possible bug that might come from the other piece of code doing something like below
// Increment pointer ptr++ //Update Index index = (ptr - sPtr) // Other code navigating based on index which will result in reading null character
Example screenshot:
The ptr[index]
way and incremented pointer, I think, theoretically can potentially get out of the bound of the string.
Using incremented pointer may gain a possible performance boost by substitute add calculations with simpler increments, and therefore allow hardware to prefetch data. To look backward, we can still use ptr[-offset]
.
Every time we increment ptr
, we check against '\0'
, except the following part I did not quite understand what it was for.
if (current == settings._quoteChar) {
hasQuote = true;
} else if (current == 'n') {
ptr += 3;
index += (int)(ptr - startPtr);
return null;
}
Concerning the possible problems, we may actually need more unit tests.
if (current == settings._quoteChar) { hasQuote = true; } else if (current == 'n') { ptr += 3; index += (int)(ptr - startPtr); return null; }
the first part of the "if" is for checking when we encounter a quote since there could a json such as this {"Name": "Test"}
While the second is for null checks
The reason the check for quoteChar also exists is because i support both single and double quotes.
Understood.
The second part of the if
are for null
or some else tokens that begin with 'n', e.g. nabc, nxyz.
The second case will always be a null else you have an invalid json and at the point you would have already got an exception with InvalidJSONException. At least that is the expectation.
example {"Name": null}
invalid (Will fail) {"Name": nuun}
I got the following in the C# interactive window
> NetJSON.NetJSON.Deserialize<Tuple<string, string>>("{\"Item1\": n,\"Item2\": \"abc\"}")
[(, })]
> NetJSON.NetJSON.Deserialize<Tuple<string, string>>("{\"Item1\": nb,\"Item2\": \"abc\"}")
[(, abc)]
> NetJSON.NetJSON.Deserialize<Tuple<string, string>>("{\"Item1\": null,Item2\": \"abc\"}")
[(, })]
> NetJSON.NetJSON.Deserialize<Tuple<string, string>>("{\"Item1\": null, Item2\": \"abc\"}")
[(, })]
> NetJSON.NetJSON.Deserialize<Tuple<string, string>>("{\"Item1\": null, Item2\": \"abc\"}")
[(, })]
> NetJSON.NetJSON.Deserialize<Tuple<string, string>>("{\"Item1\": nul,Item2\": \"abc\"}")
[(, })]
> NetJSON.NetJSON.Deserialize<Tuple<string, string>>("{\"Item1\": nul,\"Item2\": \"abc\"}")
[(, abc)]
Nice. They did not throw any error. The property names does not have all the quotes in place.
Any update on the changes?
Currently no update. :)
Reason is most of the code is already based on index navigation which allows for things such as looking backward on a string and etc.
To look backward with pointer, we can use *(ptr - n)
, where n
is a positive number and the code retrieves the nth character before ptr
.
The original pointer is never incremented. Thanks for update 😀
Any updates yet? I could help convert your fix to use index instead if needed.
Thanks,
I did not meet with any problem at this moment. Maybe it is safe to merge?
Let me do some testing then I can merge it
Using direct incrementing pointer addressing, rather than "ptr+index", can help improve performance.
The original pointer which is passed into DecodeJSONString
would not be affected by what we have done inside the method.
Arithmetic operation on pointer is likely to be fast assuming it is not optimized by the compiler. I will see if I can test it tomorrow and then merge I suppose.
Thanks,
About the performance of pointer addressing, I previously made a series of benchmarks on various methods of memory comparison with Benchmark.net. I'd love to share my code with you here. Here's one of the benchmark results I got with my computer. I was surprised that array addressing was not slower than pointer addressing at all.
Method | Mean | Error | StdDev |
---------------------------------- |---------:|----------:|----------:|
TestWithArrayAddressing | 19.45 ns | 0.1515 ns | 0.1183 ns |
TestWithPInvoke | 21.68 ns | 0.0904 ns | 0.0706 ns |
TestWithUnsafeCompare | 28.09 ns | 0.5315 ns | 0.4712 ns |
TestWithUnsafeIncrementPtrCompare | 20.81 ns | 0.1285 ns | 0.1073 ns |
TestWithUnsafeLongPtrCompare | 11.09 ns | 0.2632 ns | 0.3939 ns |
public class CompareArrayTests
{
static readonly byte[] _A = new byte[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 9, 8, 7, 6, 5, 4, 3, 2, 1 };
static readonly byte[] _B = new byte[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 9, 8, 7, 6, 5, 4, 3, 2, 1 };
[Benchmark]
public void TestWithArrayAddressing() {
AreEqualArray(_A, _B);
}
[Benchmark]
public void TestWithPInvoke() {
AreEqualPInvoke(_A, _B);
}
[Benchmark]
public void TestWithUnsafeCompare() {
AreEqualPtr(_A, _B);
}
[Benchmark]
public void TestWithUnsafeIncrementPtrCompare() {
AreEqualIncrementPtr(_A, _B);
}
[Benchmark]
public void TestWithUnsafeLongPtrCompare() {
AreEqualLongPtr(_A, _B);
}
static bool AreEqualArray(byte[] source, byte[] target) {
if (ReferenceEquals(source, target)) {
return true;
}
if (source == null || target == null) {
return false;
}
var l = source.Length;
if (l != target.Length) {
return false;
}
for (int i = 0; i < source.Length && i < target.Length; i++) {
if (source[i] != target[i]) {
return false;
}
}
return true;
}
static bool AreEqualPInvoke(byte[] source, byte[] target) {
if (ReferenceEquals(source, target)) {
return true;
}
if (source == null || target == null) {
return false;
}
var l = source.Length;
if (l != target.Length) {
return false;
}
return memcmp(source, target, l) == 0;
}
static unsafe bool AreEqualPtr(byte[] source, byte[] target) {
if (ReferenceEquals(source, target)) {
return true;
}
if (source == null || target == null) {
return false;
}
var l = source.Length;
if (l != target.Length) {
return false;
}
fixed (byte* a = source)
fixed (byte* b = target) {
while (--l >= 0) {
if (*(a + l) != *(b + l)) {
return false;
}
}
}
return true;
}
static unsafe bool AreEqualIncrementPtr(byte[] source, byte[] target) {
if (ReferenceEquals(source, target)) {
return true;
}
if (source == null || target == null) {
return false;
}
var l = source.Length;
if (l != target.Length) {
return false;
}
fixed (byte* a = source)
fixed (byte* b = target) {
var pa = a;
var pb = b;
var end = a + l;
do {
if (*pa != *pb) {
return false;
}
pb++;
} while ((++pa) < end);
}
return true;
}
static unsafe bool AreEqualLongPtr(byte[] source, byte[] target) {
if (ReferenceEquals(source, target)) {
return true;
}
if (source == null || target == null) {
return false;
}
var l = source.Length;
if (l != target.Length) {
return false;
}
fixed (byte* a = source)
fixed (byte* b = target) {
var pa = (long*)a;
var pb = (long*)b;
var end = l >> 3;
while (--end >= 0) {
if (*pa != *pb) {
return false;
}
pa++; pb++;
}
var ba = (byte*)pa;
var bb = (byte*)pb;
if ((l & 4) != 0) {
if (*(int*)pa != *(int*)pb) {
return false;
}
ba += 4;
bb += 4;
}
if ((l & 2) != 0) {
if (*(short*)ba != *(short*)bb) {
return false;
}
ba += 2;
bb += 2;
}
if ((l & 1) != 0 && *ba != *bb) {
return false;
}
return true;
}
}
[DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl)]
private static extern int memcmp(byte[] b1, byte[] b2, long count);
}
Interesting result. There is skip security check that you can add to the dllimport to speed it up a little. I will look for the attribute
It is called. [SuppressUnmanagedCodeSecurity]
Thank you very much for the hint. Here was my result after applying the [SuppressUnmanagedCodeSecurity] attribute, the P/Invoke method was accelerated.
Method | Mean | Error | StdDev |
---------------------------------- |---------:|----------:|----------:|
TestWithArrayAddressing | 19.60 ns | 0.1137 ns | 0.0949 ns |
TestWithPInvoke | 16.81 ns | 0.0896 ns | 0.0748 ns |
TestWithUnsafeCompare | 28.25 ns | 0.6102 ns | 0.5095 ns |
TestWithUnsafeIncrementPtrCompare | 20.20 ns | 0.2037 ns | 0.1701 ns |
TestWithUnsafeLongPtrCompare | 10.92 ns | 0.0761 ns | 0.0636 ns |
Cool. Glad it made a difference. Even so, there is still some cost for using dllimport. Another way to improve this is by using Marshall unsafe function delegate. It should gain even more. https://blogs.msdn.microsoft.com/jonathanswift/2006/10/03/dynamically-calling-an-unmanaged-dll-from-net-c/
I tried that. But it brought undesired delay. I used the following code to setup the delegate and substituted the P/Invoke with that delegate.
static readonly MemCmp _MemCmp = Marshal.GetDelegateForFunctionPointer<MemCmp>
(GetProcAddress(LoadLibrary("msvcrt.dll"), "memcmp"));
delegate int MemCmp(byte[] a, byte[] b, long length);
Here was the result. Not so good :)
Method | Mean | Error | StdDev |
---------------------------------- |---------:|----------:|----------:|
TestWithArrayAddressing | 19.97 ns | 0.0940 ns | 0.0734 ns |
TestWithPInvoke | 41.44 ns | 0.4313 ns | 0.3367 ns |
TestWithUnsafeCompare | 28.53 ns | 0.2906 ns | 0.2426 ns |
TestWithUnsafeIncrementPtrCompare | 20.07 ns | 0.3984 ns | 0.3327 ns |
TestWithUnsafeLongPtrCompare | 11.00 ns | 0.0845 ns | 0.0660 ns |
Did you add this attribute [UnmanagedFunctionPointer(CallingConvention.Cdecl)] to the delegate definition?
I'd forgotten that. I added that attribute to the delegate and performed the benchmark again. The result was almost the same. I thought the link you provided was not for DLL interop acceleration, but for the flexibility for dynamically calling functions where the DLL names or function names could not be determined at compilation.
Good to know. Thanks
Rebase only.
My repository was somewhat behind yours. Thus I got it synced.
I see. I guess It make sense.
It has been quite a few months. I guessed that the original issue should had been fixed.
Yeah. The invalid string had been resolved if that is what you are referring to.
Thanks,
Avoided ptr+index addressing; Fixed invalid strings (ended with ); Fixed #184