hydrobyte / McJSON

A Delphi / Lazarus / C++Builder simple and small class for fast JSON parsing.
MIT License
58 stars 20 forks source link

Check function speed problem #19

Closed totyaxy closed 1 year ago

totyaxy commented 1 year ago

Hi!

There is an approx. My 11 MB json file, in the following structure (~20k block):

"<number>": {
       "key1": "value1",
       "key2": "value2",
       "key3": "value3",
       "key4": "value4",
     },
"<number+1>": {
       "key1": "value1",
       "key2": "value2",
       "key3": "value3",
       "key4": "value4",
     },

The Check function takes seconds. The interesting thing is that if I slightly put a main key (object) in front of it, the Check function will be much faster, which I find completely pointless.

So the structure will be like this:

"JsonData": {
"<number>": {
       "key1": "value1",
       "key2": "value2",
       "key3": "value3",
       "key4": "value4",
     },
"<number+1>": {
       "key1": "value1",
       "key2": "value2",
       "key3": "value3",
       "key4": "value4",
     },

If I mess up the structure somewhere, it finds the error in the same way, so I think checking the json structure of the first version can take so long because of some error.

hydrobyte commented 1 year ago

Hi,

See that the Check procedure has a aSpeedUp parameter. When it is false, it will check for key duplication. Due the internal use of TList, this task takes time to complete.

How are you using/calling 'Check'?

totyaxy commented 1 year ago
var
  JsonStr: string = '';
  Json: TMcJsonItem = nil;
  CheckResult: boolean = False;

begin
  JsonStr.LoadFromFile(ftJsonDataBase.Name);

  EpikTimer.Start;

  Json := TMcJsonItem.Create;
  try
    CheckResult := Json.Check(JsonStr, false/true);

(Default compile mode)

  1. without "unnecessary" (JsonData) object key (see above): Check(, false) : 10,697792sec Check(, true) : 0,728713sec

  2. with "unnecessary" (JsonData) object key (see above): Check(, false) : 0,712336sec Check(, true) : 0,712724sec

Seeing the values, I still believe that something is wrong. It is possible that the key duplication search applies only to the highest key*** (here in the second example: JsonData object key).

Edit: *** I've tried it since then, it really is

hydrobyte commented 1 year ago

Hi,

You are right, the fSpeedUp boolean flag is not propagating into child items.

I have plans to create a TMcJson class, derived from TMcJsonItem and then create just a fRoot reference inside all TMcJsonItem objects.

The root object will have special properties like fSpeedUp that any child item can read.

This way I'll fix the lack of propagation and yet reduce memory allocation.

Stay tuned.

totyaxy commented 1 year ago

Hi!

I didn't even notice that the Check function performs an additional check, thanks for the info. Duplication checking is not important for me at the moment (especially since it is slow), but #18, which is also related to the Check function, would be more important for me. I can't modify MCJson to fit all supported development environments.

By the way, for the fastest possible duplication check under fpc, I usually use this: https://lazarus-ccr.sourceforge.io/docs/lazutils/lookupstringlist/tlookupstringlist.html

hydrobyte commented 1 year ago

Hi,

18 is on my schedule here.

Also, this fRoot thing must ask to refactor any TMcJsonItem declaration to TMcJson. Is this a problem to you right now?

And, a TLookupStringList would be nice in Delphi too. I was looking the TlkBalancedTree class from ulkJson. I'll try something in this subject too.

totyaxy commented 1 year ago

Hi!

Any refactory no problem for me, but what about other users? Until now, you have also paid attention to the backward compatibility of the modifications, but I understand that the library must be fundamentally modified for further development.

I would rather think about supporting only the shortener format, but that would require completely rewriting all of everyone's old code...

hydrobyte commented 1 year ago

This TMcJson is important to fix how child items read global properties like fSpeedUp. It is a simple but important refactoring task (just rename). All other functionality will keep the same.

hydrobyte commented 1 year ago

Hi,

Please, check the new Check and CheckException methods. Meanwhile, no more SpeedUp property as it was converted to parameters (see aSpeed) and the TMcJson idea is aborted.

I'll try to improve Check in order to respond to issue #18.

totyaxy commented 1 year ago

Thank you for your work! I will answer in #18.

totyaxy commented 1 year ago

Of course, I checked, now the subkey also detects duplication, thank you!