Open wmjordan opened 4 years ago
Thanks for the idea. Would be nice to see a before and after numbers with an example .
The implementation of that function sounds a little bit weird to me.
ToCamelCase("ABC_CD") => "aBCCD" ToCamelCase("ABC_cD") => "aBCCD"
~Why does it remove the underscore and uppercase the following character?~
EDIT: Oh, it seems that it is supposed to deal with the following situation:
ToCamelCase("Abc_De") => "abcDe" ToCamelCase("Abc de") => "abcDe"
Cool. Np
Are intending to still create a pull request for this optimization? Thanks
It is too long time ago that I have forgotten this.
Here's my implementation of ToCamelCase
. I am not gonna handle the underscores or spaces.
If the name
has two or more characters, and the first 2 or 3 characters are uppercase, it preserves the case, for instance "UI", "XML", "HTTP", etc., otherwise, it changes the first character to lowercase.
static string ToCamelCase(string name) {
return Char.IsLower(name[0]) ? name
: name.Length == 1 ? name.ToLowerInvariant()
: name.Length == 2 && Char.IsUpper(name[1]) ? name
: name.Length > 2 && Char.IsUpper(name[1]) && Char.IsUpper(name[2]) ? name
: ToLower(String.Copy(name));
unsafe string ToLower(string t) {
fixed(char* p = t) {
*p = Char.ToLower(t[0]);
}
return t;
}
}
Thanks for the solution. I can see how it can integrate with space and underscore.
Perhaps we can check whether the string contains underscore or space, if so, jump to the previous slow path.
I just rechecked the source code, and saw the logic of property name writing in WritePropertiesFor
. The ToCamelCase
is only called in this place.
name = propertyName
If setting useCamelCase
name = ToCamelCase(name)
else
just use the name variable
We can furthermore optimize this to pre-calculate the camel-cased string, and store it to the dynamic assembly, then load the const string according to the corresponding setting like the following pseudo-code.
name = setting useCamelCase ? loadPrecalculatedCamelCaseName : propertyName
Since the camel-cased names are calculated only once in the above solution, there is little impact to the performance. Therefore, your existing function and mine in previous post won't make too much difference.
Here is the code to replace the original lines in NetJSON.cs.
var useCamelCaseLabel = il.DefineLabel();
var endOfCaseLabel = il.DefineLabel();
il.Emit(OpCodes.Ldloc, camelCasing);
il.Emit(OpCodes.Brtrue_S, useCamelCaseLabel); // use short jump to save some bytes
il.Emit(OpCodes.Ldstr, name); // use original name
il.Emit(OpCodes.Stloc, nameLocal);
il.Emit(OpCodes.Br_S, endOfCaseLabel);
il.MarkLabel(useCamelCaseLabel);
il.Emit(OpCodes.Ldstr, ToCamelCase(name)); // precalculate the camelCase name
il.Emit(OpCodes.Stloc, nameLocal);
il.MarkLabel(endOfCaseLabel);
The only difference will be startup time for the first call there after it will be same. Thanks for the input so far. Could you still make this into a pull request? And could you run a simple test showing the difference in start up vs current?
Thanks,
The only difference will be startup time for the first call there after it will be same.
It is not. The existing code calls ToCamelCase
each time when writing any property if the setting enables using camel case. Consequently it consumes more CPU time and generates more garbage for GC to collect.
My previous post turns the camel-cased property names into IL const strings, so the ToCamelCase
will only be called when the IL is generated. We can furthermore optimize the code to compare the result of ToCamelCase
and the name
and eliminate the check of camelCase setting if the names are the same (input name
is already camel-cased).
var useCamelCaseLabel = il.DefineLabel();
var endOfCaseLabel = il.DefineLabel();
var camelName = ToCamelCase(name);
if (camelName != name) {
il.Emit(OpCodes.Ldloc, camelCasing);
il.Emit(OpCodes.Brtrue_S, useCamelCaseLabel); // use short jump to save some bytes
il.Emit(OpCodes.Ldstr, name); // use original name
il.Emit(OpCodes.Stloc, nameLocal);
il.Emit(OpCodes.Br_S, endOfCaseLabel);
il.MarkLabel(useCamelCaseLabel);
il.Emit(OpCodes.Ldstr, camelName); // load the camelCase name from a const string, instead of calling ToCamelCase
il.Emit(OpCodes.Stloc, nameLocal);
il.MarkLabel(endOfCaseLabel);
}
else {
il.Emit(OpCodes.Ldstr, name); // use original name
il.Emit(OpCodes.Stloc, nameLocal);
}
Could you still make this into a pull request?
Please replace the existing code with the above code. And remove this line. The rest stuff, for instance, the ToCamelCase
function can be left there unchanged.
It will if cache the result of camel case using either of our implementation was what I referring to.
Without the caching, you are right it will cost more than just the startup.
In the existing code, Emit(OpCodes.Ldstr, name)
will cache the property name anyway.
If we are to enable runtime renaming of property names, we can create a type with string
fields which holds property names and we read from those fields instead, when writing out property names.
I will see what I can do regarding the caching logic. Was busy in the past week.
I will try to get to this soon. Been busy with other stuff.
@wmjordan , do we still need to make this change?
I am sorry to be too busy with my own projects too :P
Jump to this issue and use the code to pre-calculate field names if you want to further improve the performance.
Once the field names are pre-calculated, the performance impact of ToLowerCase
will be very minor.
Awesome. Thanks for the clarification.
@wmjordan, do we still want to make this changes?
Hi, I am too busy with my stuff. I have lost track with this project already. The code had been presented above. You just need to replace the line with copy and paste.
Np. I will take care of it. Let me know if I can contribute anything to your projects. Thanks again for the tip.
I am studying the source code today and found that ToCamelCase could be optimized.
The current implementation stack allocates a new
char
array, works on it and create astring
from thatchar
array.You can use
String.Copy
to copy an instance of a string, and directly pin the string with afixed
statement and work on that copy. This saves one allocation and quite a fewchar
assignments.