rpgmaker / NetJSON

Faster than Any Binary? Benchmark: http://theburningmonk.com/2014/08/json-serializers-benchmarks-updated-2/
MIT License
230 stars 29 forks source link

The implementation of ToCamelCase can be optimized #230

Open wmjordan opened 4 years ago

wmjordan commented 4 years ago

I am studying the source code today and found that ToCamelCase could be optimized.

The current implementation stack allocates a new char array, works on it and create a string from that char array.

You can use String.Copy to copy an instance of a string, and directly pin the string with a fixed statement and work on that copy. This saves one allocation and quite a few char assignments.

rpgmaker commented 4 years ago

Thanks for the idea. Would be nice to see a before and after numbers with an example .

wmjordan commented 4 years ago

The implementation of that function sounds a little bit weird to me.

ToCamelCase("ABC_CD") => "aBCCD" ToCamelCase("ABC_cD") => "aBCCD"

~Why does it remove the underscore and uppercase the following character?~

EDIT: Oh, it seems that it is supposed to deal with the following situation:

ToCamelCase("Abc_De") => "abcDe" ToCamelCase("Abc de") => "abcDe"

rpgmaker commented 4 years ago

Cool. Np

rpgmaker commented 3 years ago

Are intending to still create a pull request for this optimization? Thanks

wmjordan commented 3 years ago

It is too long time ago that I have forgotten this.

wmjordan commented 3 years ago

Here's my implementation of ToCamelCase. I am not gonna handle the underscores or spaces.

If the name has two or more characters, and the first 2 or 3 characters are uppercase, it preserves the case, for instance "UI", "XML", "HTTP", etc., otherwise, it changes the first character to lowercase.

static string ToCamelCase(string name) {
    return Char.IsLower(name[0]) ? name
        : name.Length == 1 ? name.ToLowerInvariant()
        : name.Length == 2 && Char.IsUpper(name[1]) ? name
        : name.Length > 2 && Char.IsUpper(name[1]) && Char.IsUpper(name[2]) ? name
        : ToLower(String.Copy(name));

    unsafe string ToLower(string t) {
        fixed(char* p = t) {
            *p = Char.ToLower(t[0]);
        }
        return t;
    }
}
rpgmaker commented 3 years ago

Thanks for the solution. I can see how it can integrate with space and underscore.

wmjordan commented 3 years ago

Perhaps we can check whether the string contains underscore or space, if so, jump to the previous slow path.

wmjordan commented 3 years ago

I just rechecked the source code, and saw the logic of property name writing in WritePropertiesFor. The ToCamelCase is only called in this place.

name = propertyName
If setting useCamelCase
 name = ToCamelCase(name)
else
 just use the name variable

We can furthermore optimize this to pre-calculate the camel-cased string, and store it to the dynamic assembly, then load the const string according to the corresponding setting like the following pseudo-code.

name = setting useCamelCase ? loadPrecalculatedCamelCaseName : propertyName

Since the camel-cased names are calculated only once in the above solution, there is little impact to the performance. Therefore, your existing function and mine in previous post won't make too much difference.

Here is the code to replace the original lines in NetJSON.cs.

var useCamelCaseLabel = il.DefineLabel();
var endOfCaseLabel = il.DefineLabel();

il.Emit(OpCodes.Ldloc, camelCasing);
il.Emit(OpCodes.Brtrue_S, useCamelCaseLabel); // use short jump to save some bytes

il.Emit(OpCodes.Ldstr, name); // use original name
il.Emit(OpCodes.Stloc, nameLocal);
il.Emit(OpCodes.Br_S, endOfCaseLabel);

il.MarkLabel(useCamelCaseLabel);
il.Emit(OpCodes.Ldstr, ToCamelCase(name)); // precalculate the camelCase name
il.Emit(OpCodes.Stloc, nameLocal);

il.MarkLabel(endOfCaseLabel);
rpgmaker commented 3 years ago

The only difference will be startup time for the first call there after it will be same. Thanks for the input so far. Could you still make this into a pull request? And could you run a simple test showing the difference in start up vs current?

Thanks,

wmjordan commented 3 years ago

The only difference will be startup time for the first call there after it will be same.

It is not. The existing code calls ToCamelCase each time when writing any property if the setting enables using camel case. Consequently it consumes more CPU time and generates more garbage for GC to collect.

My previous post turns the camel-cased property names into IL const strings, so the ToCamelCase will only be called when the IL is generated. We can furthermore optimize the code to compare the result of ToCamelCase and the name and eliminate the check of camelCase setting if the names are the same (input name is already camel-cased).

var useCamelCaseLabel = il.DefineLabel();
var endOfCaseLabel = il.DefineLabel();
var camelName = ToCamelCase(name);

if (camelName != name) {
  il.Emit(OpCodes.Ldloc, camelCasing);
  il.Emit(OpCodes.Brtrue_S, useCamelCaseLabel); // use short jump to save some bytes

  il.Emit(OpCodes.Ldstr, name); // use original name
  il.Emit(OpCodes.Stloc, nameLocal);
  il.Emit(OpCodes.Br_S, endOfCaseLabel);

  il.MarkLabel(useCamelCaseLabel);
  il.Emit(OpCodes.Ldstr, camelName); // load the camelCase name from a const string, instead of calling ToCamelCase
  il.Emit(OpCodes.Stloc, nameLocal);

  il.MarkLabel(endOfCaseLabel);
}
else {
  il.Emit(OpCodes.Ldstr, name); // use original name
  il.Emit(OpCodes.Stloc, nameLocal);
}

Could you still make this into a pull request?

Please replace the existing code with the above code. And remove this line. The rest stuff, for instance, the ToCamelCase function can be left there unchanged.

rpgmaker commented 3 years ago

It will if cache the result of camel case using either of our implementation was what I referring to.

Without the caching, you are right it will cost more than just the startup.

wmjordan commented 3 years ago

In the existing code, Emit(OpCodes.Ldstr, name) will cache the property name anyway.

If we are to enable runtime renaming of property names, we can create a type with string fields which holds property names and we read from those fields instead, when writing out property names.

rpgmaker commented 3 years ago

I will see what I can do regarding the caching logic. Was busy in the past week.

rpgmaker commented 3 years ago

I will try to get to this soon. Been busy with other stuff.

rpgmaker commented 1 year ago

@wmjordan , do we still need to make this change?

wmjordan commented 1 year ago

I am sorry to be too busy with my own projects too :P

Jump to this issue and use the code to pre-calculate field names if you want to further improve the performance.

wmjordan commented 1 year ago

Once the field names are pre-calculated, the performance impact of ToLowerCase will be very minor.

rpgmaker commented 1 year ago

Awesome. Thanks for the clarification.

rpgmaker commented 7 months ago

@wmjordan, do we still want to make this changes?

wmjordan commented 7 months ago

Hi, I am too busy with my stuff. I have lost track with this project already. The code had been presented above. You just need to replace the line with copy and paste.

rpgmaker commented 7 months ago

Np. I will take care of it. Let me know if I can contribute anything to your projects. Thanks again for the tip.