samchon / typia

Super-fast/easy runtime validators and serializers via transformation
https://typia.io/
MIT License
4.62k stars 159 forks source link

updated fast-json-stringify #736

Closed Uzlopak closed 1 year ago

Uzlopak commented 1 year ago

Can you please update the benchmarks?

Btw. I updated the serializer for asString and I think you are using a similar serializer for strings. Maybe you want to adapt it for more performance. https://github.com/fastify/fast-json-stringify/commit/17bb4c2430c60a44079ef572766a8150ab70aefc#diff-8e3d45dd0e9ec504195499d7fafe3efc08756c349fce602ef6538c593aa563d8

Also the benchmarks are a little bit unfair. E.g. you are benching for stringify but fast-json-stringify is not only stringifying but also does some assertions, like checking for required fields.

samchon commented 1 year ago

https://dev.to/samchon/good-bye-typescript-is-ancestor-of-typia-20000x-faster-validator-49fi

Around the same time, I had made another library named typescript-json.

It performs AoT (Ahead of Time) compliation like typescript-is, but it was not for runtime validation, but for JSON schema generation. About JSON serialization boosting, typescript-json had utilized fast-json-stringify by automatically generated JSON schema.

For reference, purpose of typescript-json was to accomplish below nestia, generating Swagger Documents with pure TypeScript type.

Thanks for notification. As typia had started from wrapper of fast-json-stringify (at that time, package name was typescript-json), I'm still using string serializer of fast-json-stringify. I'll update string serializer following fast-json-stringify. Also, I'll perform benchmark again with latest version of fast-json-stringify.

By the way, typia has four stringify functions and assertStringify is simliar with fast-json-stringify. Benchmark result is showing all of those typia stringify functions (https://typia.io/docs/json/stringify/#performance), but as normal users don't know the difference, I can understand why you feel it as unfair. About this topic, how about adding a comment that fast-json-stringify is similar with typia.assertStringify() under the graph? If you want, you can edit https://github.com/samchon/typia/blob/master/website/pages/docs/json/stringify.mdx#L497-L507, and send a PR for that.

Uzlopak commented 1 year ago

Sure, I will provide a PR.

Please check if the patch for serializer for asString makes a perf boost. If you see a performance regression, than I would need to investigate.

samchon commented 1 year ago

You can measure benchmark of typia.stringify<T>() function by below commands.

# PREVIOUS SMALL-STRING
git clone https://github.com/samchon/typia -b features/original typia@original
cd typia@original
npm install
npm run benchmark

# ADVANCED SMALL-STRING
cd ..
git clone https://github.com/samchon/typia -b features/stringify typia@advanced
cd typia@advanced
npm install
npm run benchmark

By the way, as my benchmark program of stringify function handling composite types, it would much better to make a new dedicated benchmark function for only your asStringSmall() function.

samchon commented 1 year ago

https://github.com/samchon/typia/blob/features/stringify/test/issues/736.ts

I made a dedicated benchmark, and you can run it through:

git clone https://github.com/samchon/typia -b features/stringify typia@stringify
cd typia@stringify
npm install
npm run issue 736
samchon commented 1 year ago

@Uzlopak Fixed length to be 41, and new algorithm became slower than before

previous 6867.585931699803
advanced 6861.237658316788

When fix length to be 5, and new algorithm is still slower

previous 33501.367175249296
advanced 33126.283614757245
Uzlopak commented 1 year ago

Yes, the number 42 is super arbitrary. I asked @mcollina why 42, but it seems it was a random number.

It should be investigated when the breaking point is useful.

Uzlopak commented 1 year ago

didnt you write that in a special case it was double the perf?

samchon commented 1 year ago

@Uzlopak Oops, I missed the STR_ESCAPE pattern checking.

After adjusting that, new algorithm became faster when length 41. However, length 5 became slower.

The 2x faster was by my mistake that skipping length > 41 checking.

# LENGTH 41
previous 5193.547705809435
advanced 5982.83813974943

# LENGTH 5
previous 23195.423057822234
advanced 20127.113157546624
Uzlopak commented 1 year ago

Your code is different from ours.

Your code is length > 42 and call JSON.stringify, str escape check and wrap in double quotes, then fallback to simple case

It has to be length < 42 do simple case, str escape check and wrap in double quotes, fallback to JSON.stringify

samchon commented 1 year ago

@Uzlopak Oops, did lots of mistake. I repeated it carefully, and this may be the final result.

Sorry for repeated mistakes. It was so confusing because I did another at the same time.

Tried sequence of if conditions, but it was not a matter.

# LENGTH 41
previous 6739.206407670302
advanced 19771.155382046658

# LENGTH 5
previous 30564.183105977772
advanced 32337.116211885812
Uzlopak commented 1 year ago

Can you investigate if 42 as string length is optimal?

samchon commented 1 year ago

Well, as regex format condition newly added, I should consider which string to be used.

Do you have any idea about it?

Uzlopak commented 1 year ago

Make a very string matching this regex /[a-z0-9]/+. At the end put a double quote "

samchon commented 1 year ago

Normal characters

typia@4.1.15 issue node test/issue 736-normal

Limit Native Optimized Gap
#10 13,231 35,170 165.82 %
#20 10,373 20,038 93.17 %
#30 8,009 13,359 66.81 %
#40 5,733 8,997 56.93 %
#50 5,005 7,741 54.65 %
#60 5,260 7,851 49.28 %
#70 4,536 5,658 24.76 %
#80 3,637 4,928 35.49 %
#90 3,364 4,207 25.06 %
#100 2,504 3,829 52.92 %
#200 1,913 1,932 1.02 %
#300 1,312 1,359 3.6 %
#400 1,052 1,006 -4.41 %
#500 826 857 3.74 %
#600 781 884 13.17 %
#700 761 780 2.49 %
#800 669 552 -17.5 %
#900 521 478 -8.19 %
#1,000 449 462 2.96 %

Surrounded by " characters

typia@4.1.15 issue node test/issue 736-special

Limit Native Optimized Gap
#10 11,640 14,012 20.38 %
#20 9,499 14,581 53.5 %
#30 6,159 10,490 70.31 %
#40 6,104 9,964 63.24 %
#50 5,640 8,743 55.03 %
#60 5,454 7,924 45.29 %
#70 4,794 6,160 28.48 %
#80 3,978 4,740 19.15 %
#90 3,949 5,202 31.71 %
#100 3,170 4,363 37.65 %
#200 1,983 2,673 34.78 %
#300 1,323 1,569 18.64 %
#400 1,062 1,355 27.56 %
#500 907 1,151 26.99 %
#600 814 963 18.29 %
#700 694 847 22.17 %
#800 642 746 16.12 %
#900 592 652 10.15 %
#1,000 509 575 13.12 %

Only regex pattern

typia@4.1.15 issue node test/issue 736-regex

Limit Native Optimized Gap
#10 12,645 28,811 127.84 %
#20 10,155 26,702 162.93 %
#30 8,256 22,639 174.21 %
#40 6,658 23,138 247.53 %
#50 5,996 19,395 223.45 %
#60 5,562 15,214 173.54 %
#70 4,349 13,053 200.17 %
#80 4,028 11,180 177.55 %
#90 3,683 9,873 168.04 %
#100 3,027 9,763 222.49 %
#200 1,904 5,624 195.35 %
#300 1,467 4,105 179.73 %
#400 1,102 3,325 201.69 %
#500 886 2,721 207.22 %
#600 773 2,318 199.76 %
#700 687 2,056 199.52 %
#800 610 1,751 187.34 %
#900 570 1,239 117.42 %
#1,000 483 1,752 262.94 %
samchon commented 1 year ago

You also run that command, and determine which length to be use.

In my opinion, the length 42 is reasonable because 50 seems like the diminishing margin.

samchon commented 1 year ago

Comparing regex and special case, current code seems reasonable.

Diminishing margin of manual serialization logic is about 40 to 50.

Also, even though target string over the 42 length, regex pattern extremely diminish the serialization time.

Uzlopak commented 1 year ago

Currently running the benchmarks:

https://github.com/fastify/fast-json-stringify/pull/637

samchon commented 1 year ago

I'm just confused by only regex filtered case. It is even faster than optimized case when no special character exists.

When special character exists, advanced manual stringify logic is faster, so I'm considering below implementation.

How do you think about below code, @Uzlopak ?

export const $string = (str: string): string => {
    if (STR_ESCAPE.test(str) === false) 
        return `"${str}"`;

    if (str.length > 41)
        return JSON.stringify(str);

    ...OPTIMIZED LOGIC
}
samchon commented 1 year ago

Currently running the benchmarks:

fastify/fast-json-stringify#637

Great enhancement on short string, but short string with double quote be decreased.

Uzlopak commented 1 year ago

In my opinion:

The regex is theoretically always slower than processing every character in a for loop. Doing first the regex, means that this is the geneeral bottleneck. even though the optimized logic is also handling the same unicoode and double quotes etc.

The length check seems to be the cheapest operation from all. So thats why I would do that one first.

Uzlopak commented 1 year ago

benchmarking now your consideration

https://github.com/fastify/fast-json-stringify/pull/637

Uzlopak commented 1 year ago

Always a tradeoff. The expectation should be actually that short string without escape characters are the majority and the strings with escape characters are exception. Also short strings are more common than huge strings

So personally I think

export const $string = (str: string): string => {
    if (STR_ESCAPE.test(str) === false) 
        return `"${str}"`;

    if (str.length > 41)
        return JSON.stringify(str);

    ...OPTIMIZED LOGIC
}

is the better tradeoff regarding the benchmarks.

But somehow the benchmarks are counter intuitive