Speed up lv_i18n_get_text_core by clever optimization.

bubeck commented 2 years ago

I am willing to improve https://github.com/lvgl/lv_i18n/blob/f871e05805a6847201454034bcc3362c0167f8ad/src/lv_i18n.template.c#L175 for speed. Before issuing a PR I would like to validate my idea here and ask wether you are willing to accept or discuss this.

Currently __lv_i18n_get_text_core is O(n) which means, that it takes longer if the list of strings gets longer. In my case this leads to inacceptable long run time to find the translation. My change leads to O(1), that its much faster.

The basic idea is illustrated by the follwing example:

// German translation of "Hello World" and "Good Bye":
static char *de_DE_singulars[] = { 
  "Hallo Welt",     // Index 0
  "Tschuess"        // Index 1
};

printf ("%s\n", _("Hello World"));

This gets translated at compile time to:

  char *_1 = lv_i18n_get_text (1);
  __builtin_puts (_1);

where lv_i18n_get_text(1) then simply does:

char *
lv_i18n_get_text (int index)
{
  if (index == 0)
    return NULL;
  return de_DE_singulars[index - 1];
}

The magic part is, how the compiler comes from ("Hello World") to the index position "1" (which is the index position of this string). ("Good Bye") would give "2". Here is the solution:

#define IDX(str1) (!strcmp(str1, "Hello World") ? 1 : (!strcmp(str1, "Good Bye") ? 2 : 0))
#define _(str) (IDX(str) == 0 ? str : lv_i18n_get_text(IDX(str)))
  printf ("%s\n", _("Hello World"));

The CPP expands this to:

  printf ("%s\n", ((!strcmp("Hello World", "Hello World") ? 1 : (!strcmp("Hello World", "Good Bye") ? 2 : 0)) == 0 ? "Hello World" : lv_i18n_get_text((!strcmp("Hello World", "Hello World") ? 1 : (!strcmp("Hello World", "Good Bye") ? 2 : 0)))));

which then gets compiled (!) to:

  _1 = lv_i18n_get_text (1);
  __builtin_puts (_1);

The important thing is, that IDX is optimized by the compiler at compile time to the index without the use of any strcmp executed at run time. This is done, as all strings are constant and the compiler has a knowledge of strcmp as well. It therefore eavluates the constants at compile time to the corresponding index.

This optimization is certainly compiler dependant. It is done by GCC since version 4 (at least) available since around 8 years. It is also true for clang and probably most other compilers. It does not need "-O" or anthing else. It is done by standard compiler options in any case. It is also done with "-g" and also with "-O0".

Here is the complete example.

#include <stdio.h>
#include <string.h>

static char *de_DE_singulars[] = { 
  "Hallo Welt", 
  "Tschuess" 
};

char *
lv_i18n_get_text (int index)
{
  printf("lv_i18n(%d)\n", index);
  if (index == 0)
    return NULL;
  return de_DE_singulars[index - 1];
}

#define IDX(str1) (!strcmp(str1, "Hello World") ? 1 : (!strcmp(str1, "Good Bye") ? 2 : 0))
#define _(str) (IDX(str) == 0 ? str : lv_i18n_get_text(IDX(str)))

int
main ()
{
  int index;
  printf ("%d\n", IDX ("Hello World"));
  printf ("%d\n", IDX ("Good Bye"));
  printf ("%d\n", IDX ("Unknown"));
  printf ("%s\n", _("Hello World"));
  printf ("%s\n", _("Unknown"));

  printf("Division by zero will follow (because of dumb compiler)\n");
  index = 10 / IDX("Unknown");
  return 0;
}

This gets translated to

main ()
{
  int index;
  int D.2357;
  char * D.2356;
  char * _1;
  int _3;

  <bb 2>:
  printf ("%d\n", 1);
  printf ("%d\n", 2);
  printf ("%d\n", 0);
  _1 = lv_i18n_get_text (1);
  __builtin_puts (_1);
  __builtin_puts ("Unknown");
  __builtin_puts (&"Division by zero will follow (because of dumb compiler)"[0]);
  index_2 = 10 / 0;
  _3 = 0;

<L0>:
  return _3;

}

My change would go for "lv_i18n compile" adding a flag like "--optimize". When given, then the above code would be produced, instead of the previous one. This means especially that IDX will be generated with all translationKeys.keys. Without that flag, everything will stay unchanged. Developers can then decide, if their toolchain has the necessary optimization and then use "--optimize" or not.

What do you think? If you agree on this idea, I will implement this and do a PR

kisvegabor commented 2 years ago

Sounds like an interesting idea!

Just to be sure I understand correctly

If there are 1000 "tags" then IDX will have be 999 level nested ? : expression, right? I'm not sure it's a problem, just good to e aware of it.
It doesn't apply to plurals . Also not a problem, as probably there are much less plural translations.

bubeck commented 2 years ago

Sounds like an interesting idea!

Thanks! I also like it. ;)

1. If there are 1000 "tags" then `IDX` will have be 999 level nested  `? :`  expression, right? I'm not sure it's a problem, just good to e aware of it.

Yes, thats right. The imporant fact here is, that IDX get evaluated by the compiler at compile time to the right index (e.g. 18). so there is no runtime penalty for finding it. I tried it on clang and gcc with 3000 tags. It worked as well. The only small problem is, that compile time slightly increases, but really not big. I already made a dirty version for our own project and it works like a charm: https://github.com/fhorinek/BB.

2. It doesn't apply to plurals . Also not a problem, as probably there are much less plural translations.

The idea can also be applied to plurals, no problem. Only the PoC is only for singular, but the idea is valid for plural as well.

You can find a mostly dirty and hacked version for our own project under https://github.com/fhorinek/BB/tree/master/Utilities/lv_i18n/node_modules/lv_i18n. I will improve to make it more clean and general usable.

I am willing to implement it as a PR but only if the idea gets a chance to get included. Therefore I am asking here for your feedback if I should continue.

kisvegabor commented 2 years ago

Sounds good to me.

Just one thing: apply logarithmic search could be an option too. It'd reduce 1000 searches to 10 (1%) and would be a more standard approach. What do you think about it?

bubeck commented 2 years ago

Yes, goods point.

Thanks for your ideas and help. I will finish the PR in the next days.

puzrin commented 1 year ago

Phrase search was done "quick and dirty" since phrases count in embedded is usually small. We decided to improve it later, if required.

So, my opinion is:

Better search should be landed (via halving and so on).
I'd avoid to introduce new options if possible. IMO package should "just work".

Consider halving search as "good enough with minimal efforts" (but I do not insist and open to any suggestions)

bubeck commented 1 year ago

In our case (https://github.com/fhorinek/BB/tree/master/BB3/App/lv_i18n), we have around 500 keys (will probably increase to around 700) and the implementation was definitevly too slow. This is why I came up with this proposal. You are right, that it could be improved by binary search but still we would need around 9 strcmp to find the right key with 500 to look through. As this is done quiet often, it is too slow for our project.

The above solution reduces this to 1 integer compare, regardless of the number of keys, which is substantially faster.

I introduces an additional option "--optimize" to make sure, that the code base is unchanged, as long as we stabilize and test the idea. If you want, you can make "--optimize" the default if we all are sure, it works well.

As the PR is nearly finished (needs some polishing), I will finialize it and send to you. It is then definitely up to you to decide how to proceed. I am certainly willing to adopt to your propsals and wishes. :)

puzrin commented 1 year ago

I understand general idea. It needs data rework, because macro also needs internal state (locale), and compile-time index has no access to that.

Probably, this can be achieved by align all sigulars & plurals between different languages, at cost of some minor flash size (insert empty values or markers for not-existing translations).

https://github.com/lvgl/lv_i18n/blob/master/src/lv_i18n.template.c - if you could manually rewrite this template to new format, I will start thinking how to arrange that in JS. Check points are:

languages should have different number of forms in plurals
translations should have holes (missed phrases)

Ideally, it would be fine to leave just your optimization, without options. Alternative - C macros (#ifdef) to activate fallback for ancient compilers. If both impossible - then will consider --option. Can not say anything more certain until see new data structs.

bryghtlabs-richard commented 4 months ago

Another slightly slower O(1) approach would be minimal perfect hashmap like gperf generates. Each translation would be a different sourceString->translatedString map. This would not require changing the lv_i18n_get_text() API.

bryghtlabs-richard commented 4 months ago

I thought I was facing the same issue, but we had something else causing us to retranslate the current screen too often.

Here's a rough implementation of the binary search method that assumes Node.js and strcmp() sort the same way.

teaalltr commented 2 months ago

Hi @puzrin @bubeck @bryghtlabs-richard is there any news on this?

bubeck commented 1 month ago

Hi puzrin bubeck bryghtlabs-richard is there any news on this?

@teaalltr The implementation is done in PR https://github.com/lvgl/lv_i18n/pull/55. I hope, the authors merge ASAP.

I would be very happy, if you could give it a try and let me know, how good it works for you. Any feedback is appeciated. Download from: https://github.com/bubeck/lv_i18n/archive/d6ea22ee8a72fb3b5b8406360560596579975aa1.zip

lvgl / lv_i18n

Speed up lv_i18n_get_text_core by clever optimization. #53