use --optimize to generate faster code without strcmp at runtime

bubeck commented 1 year ago

Fixes https://github.com/lvgl/lv_i18n/issues/53

kisvegabor commented 1 year ago

I've tried it out and noticed 2 things:

The msg_indexs are off by one. The first index should be 0.

This part:

#ifdef LV_I18N_OPTIMIZE
const char * singulars;
#else

should be

#ifdef LV_I18N_OPTIMIZE
    const char ** singulars;
#else

bubeck commented 1 year ago

Thanks for the feedback. Awkward, as I tried it out in my project and then cleaned up for PR and obviously made these mistakes. I will come up with an updated PR.

kisvegabor commented 1 year ago

No problem :)

bubeck commented 1 year ago

Problems fixed, and rebased to current master. Removed lodash. Ready for inclusion or feedback.

puzrin commented 1 year ago

I don't understand how should it work. Existing C template is 100% working example, regenerated with npm run template_update. In your commit index seems to be missed (at least, could not find it for current demo data).

bubeck commented 1 year ago

lv_i18n.template.c is included in the PR (and it is changed to deal with LV_I18N_OPTIMIZE. Instead of lv_i18n.h I am using lv_i18n.template.h, as this also has to be generated (because it depends on --optimize or not).

I also added tests into test/c/Makefile, where you can see, how it is used. There is a test for --optimize and without --optimize.

What exactly does not work or is unclear?

puzrin commented 1 year ago

1.

https://github.com/lvgl/lv_i18n/blob/9070d06be9c637a3c169bb183eb491bafbb855e4/src/lv_i18n.template.h#L11-L14

This is header template. It seems to miss index for demo data. Idea of this script is to fill template with demo info from demo data. That helps to understand data structs, verify generator and so on.

Suffix "_optimized" (lv_i18n_get_text_optimized) is not descriptive (it does not explain what happens). I'd suggest something like lv_i18n_get_text_by_idx
While I think even single extra option is "too much", there are 2 new options introduced - --raw-header & --optimize. Could you explain how those are expected to be used in real world?
I'd suggest to avoid dynamically generated sources if possible. Because that complicates data structs understanding and debug. That means:
- Make sure all templates are filled with demo data
- Try to generate the same files for all modes and select desired mode with guards (LV_I18N_OPTIMIZE (0/1), LV_I18N_OPTIMIZE_LEVEL (0/1/2) and so on [names are for example only]).
- Avoid lv_i18n compile run in Makefile as part of test build (if possible).

In short - let's move from "coding" to "data design". As first step, i'd like to understand new data structs (including indexes). So:

Please make sure .h & .c contains full demo info, and can be used "as is" after simple copy & rename.
Please make sure, all data edge cases are covered (missed base phrase, missed translation, missed plural form translation, different number of plurals in languages, and so on...). If needed - update demo data source.

As soon as data structs clear for optimized mode, we will see how to use the same data for non-optimized mode, if possible. After data design complete, it would be easy to update code without extra reworks (i will help with js if needed).

Let me know if you see things different or have any questions.

bubeck commented 1 year ago

First answers

https://github.com/lvgl/lv_i18n/blob/9070d06be9c637a3c169bb183eb491bafbb855e4/src/lv_i18n.template.h#L11-L14

will do

2. Suffix "_optimized" (`lv_i18n_get_text_optimized`) is not descriptive (it does not explain what happens). I'd suggest something like `lv_i18n_get_text_by_idx`

will do

3. While I think even single extra option is "too much", there are 2 new options introduced - `--raw-header` & `--optimize`. Could you explain how those are expected to be used in real world?

The real world example is currently: use --optimize to generate optimized version with integer index. This must be generated into the header file (e.g. lv_i18n.h) as it must be present at compile time of the _().

If you want to also use --raw then I need a second filename of the header file to be generated. I wanted to use --raw but this is only for the c file. I could remove --raw-header and derive the name of the header file from the name given for --raw by replacing the extension from .c to .h. Would that be better?

If you like, we can remove option --optimize and always generate normal and optimized version. The developer can decide which version he want be defining #define LV_I18N_OPTIMIZE 1 outside the generated code before including lv_i18n.h. Would you prefer this?

4. I'd suggest to avoid dynamically generated sources if possible. Because that complicates data structs understanding and debug. That means:

   * Make sure all templates are filled with demo data
   * Try to generate the same files for all modes and select desired mode with guards (`LV_I18N_OPTIMIZE` (0/1), `LV_I18N_OPTIMIZE_LEVEL` (0/1/2) and so on [names are for example only]).
   * Avoid `lv_i18n compile` run in Makefile as part of test build (if possible).

will do, if I get the above proposals from you.

In short - let's move from "coding" to "data design". As first step, i'd like to understand new data structs (including indexes). So:

* Please make sure `.h` & `.c` contains full demo info, and can be used "as is" after simple copy & rename.

* Please make sure, all data edge cases are covered (missed base phrase, missed translation, missed plural form translation, different number of plurals in languages, and so on...). If needed - update demo data source.

puzrin commented 1 year ago

The real world example is currently: use --optimize to generate optimized version with integer index. This must be generated into the header file (e.g. lv_i18n.h) as it must be present at compile time of the _().

If you want to also use --raw then I need a second filename of the header file to be generated. I wanted to use --raw but this is only for the c file. I could remove --raw-header and derive the name of the header file from the name given for --raw by replacing the extension from .c to .h. Would that be better?

I mean, as user of this package, I don't care about internals. I just need to "rebuild translations" in one step. From this point of view, introducing a lot of options just confuse the user with unnecessary details.

Output of this package is "set of i18n files", not separate .h and separate .c file. Those should be generated simultaneously.

Also note, in far future, this package can be extended with writer for another language (Rust and so on). So, option should not be languange-specific when possible.

If you like, we can remove option --optimize and always generate normal and optimized version. The developer can decide which version he want be defining #define LV_I18N_OPTIMIZE 1 outside the generated code before including lv_i18n.h. Would you prefer this?

Yes, if that's technically possible. Because different modes can require different data layouts. That's why I suggested to start with update of templates, to verify everything about data and minimise rework.

bubeck commented 1 year ago

Next round. Please give me feedback and directions.

The following is done:

rename lv_i18n_get_text_optimized() to lv_i18n_get_text_by_idx()
remove option --raw-header. Instead of specifying a header file name for raw, it is derived from the filename given to --raw by changing extension to .h.
Extended template_update so that it updates templates for header file as well,

The template is now generated for mode "not optimized" as this was the previous mode. If you want, you can regenerate with --optimize by changing template_update.

Question: What is missing, what should be changed?

puzrin commented 1 year ago

Sorry, was busy today. If you have time, please add "optimized" templates near current ones, temporary. All I need to verify/rework data structs - template files. If running of script needed - that requires to remember how that works :)

bubeck commented 1 year ago

No Problem. Here we are. This is the code also generated for test/c.

puzrin commented 1 year ago

Sorry for delay, see summary below

Files Layout

IMO it would be more useable, if the i18n compiler always generates 2 files (.c/.h), containing both optimized and slow versions (Selected by LV_I18N_OPTIMIZE guard). The content of each file could look like this:

// common part

#if LV_I18N_OPTIMIZE = 1

// optimize-specific part

#else

// slow-specific part

#endif

Data design

Plurals still use the slow version. IMO, they should have a separate index, with the same principle as singular.

Data struct for each language plural can be:

{
   plural_forms_amount: N // depends on language
   list_of strings: {
      plural1_form1,
      plural1_form3,
      plural1_form4,
      plural2_form1,
      plural2_form3,
      plural2_form4,
      plural3_form1,
      plural3_form3,
      plural3_form4,
   }
}

Or, even more straightforward - always reserve room for all 6 forms. Overhead will not be significant. I do not insist on any concrete implementation, only explain how to reorganize files in a more structured form.

Please note that data design and usability are the most important things for writing code without rework. For this stage, it's enough to write .c/.h templates manually, without coding at all (but I do not insist if you are ok with core rewrites).

Minor things

IMO index macros could be multiline for better readability (each phrase on a separate line, with \ at the end).
If LV_I18N_OPTIMIZE is not set, well-known compilers can be auto-detected (not sure if that's a good idea).

bubeck commented 1 year ago

next round, please feedback

Plurals not yet done, to first check, that singulars are ok.
index macro is now multiline
code is now optimize and standard at the same time. Developer can select by #define LV_I18N_OPTIMIZE before including lv_i18n.h.

puzrin commented 1 year ago

LGTM in general.

The only note about internals - please avoid data/code mixture if possible https://github.com/lvgl/lv_i18n/blob/a64d2baea7a0d8e7a5d9351b457f4682fa028fb8/test/c/combined-sample/lv_i18n.c#L61-L65. It's more clear to define structs separate for each mode.

Ideally would be to have 1 section for optimized and 1 section for slow variant per file. That will simplify understanding and next modifications.

Please add indexed plurals to example.

bubeck commented 1 year ago

. It's more clear to define structs separate for each mode.

You mean something like

static const lv_i18n_lang_t en_gb_lang = {
    .locale_name = "en-GB",
    .singulars = en_gb_singulars,
    .plurals[LV_I18N_PLURAL_TYPE_ONE] = en_gb_plurals_one,
    .plurals[LV_I18N_PLURAL_TYPE_OTHER] = en_gb_plurals_other,
    .locale_plural_fn = en_gb_plural_fn
};

static const lv_i18n_lang_indexed_t en_gb_lang_indexed = {
    .locale_name = "en-GB",
    .singulars_indexed = en_gb_singulars_indexed,
    .plurals[LV_I18N_PLURAL_TYPE_ONE] = en_gb_plurals_one,
    .plurals[LV_I18N_PLURAL_TYPE_OTHER] = en_gb_plurals_other,
    .locale_plural_fn = en_gb_plural_fn
};

This would mean, that we have different lv_i18n_lang_t and lv_i18n_lang_indexed_t and this would mean, that we have to duplicate all functions taking lv_i18n_lang_t as a parameter so that there is another similar function dealing with lv_i18n_lang_indexed_t.

I mean, this is your code, and I will do as you select, but IMHO this adds much complexity to the software without much benefit in clarity.

Please let me know how you want to continue for data design. After this is clear, I will continue with plurals.

puzrin commented 1 year ago

This would mean, that we have different lv_i18n_lang_t and lv_i18n_lang_indexed_t and this would mean, that we have to duplicate all functions taking lv_i18n_lang_t as a parameter so that there is another similar function dealing with lv_i18n_lang_indexed_t.

IMO, current names difference is caused by old plurals code (because you use both approaches instead of single one). When indexed and non indexed parts will be in separate two sections, probable suffixes will not be required at all.

I mean, this is your code, and I will do as you select, but IMHO this adds much complexity to the software without much benefit in clarity.

Please do data design in any format you prefer. Until data design is complete, all next steps are very fragile. Right now it's not critical how the code looks like. The only principal point is the complete data layout.

bubeck commented 1 year ago

I will make a proposal within the next day(s).

bubeck commented 1 year ago

Is this the way you want it?

puzrin commented 1 year ago

May be. Could you add indexed plurals part. It's very difficult to dicsuss details with incomplete data.

bubeck commented 1 year ago

OK.

bubeck commented 1 year ago

I hope that I understood you right. Please find a complete example of the code at https://github.com/lvgl/lv_i18n/pull/55/commits/6634aa03b1b3d6711f687e1cad0eaa6f1b3ad0cd. It is done for singular, as well as plural, but it has not been tested. It is meant to help discussion the solution which we try to find. It is hand written and not yet generated.

Please let me know your feedback.

puzrin commented 1 year ago

https://github.com/lvgl/lv_i18n/commit/bdbdfe47dd96c4a367db82d546c5977d8b709768 see my proposal.

All code regrouped to leave only 2 sections. IMHO that significantly improves readability/debug/maintenance at a very small cost (duplicated some structs definitions).

I did not polish names, I only moved code lines. Probably, some terms can be accented with _by_idx, _simple, _singular and other suffixes of such kind (not critical now).

Tell me, if you agree with this proposal. If you have no objections, then I will refactor fn/struct names for the final variant, and the templates will be ready for coding.

bubeck commented 1 year ago

Sorry for the delay, ... Family time...

I have no objections against your proposal. It is fine from my side. I am now waiting for your refactorization for the final variant. Please let me know, if I should do anything.

puzrin commented 1 year ago

https://github.com/lvgl/lv_i18n/commit/af62c7e2c95cbb332feb2121e41780cae05fb718 - polished some names.

After first refactoring pass, i found we can unify things even better - switch to concept of compiler/runtime lookup index.

Current "slow search" data:

static lv_i18n_phrase_t en_gb_singulars[] = {
    {"s_en_only", "english only"},
    {"s_translated", "s translated"},
    {NULL, NULL} // End mark
};

#define _(text) lv_i18n_get_singular_by_map(text)

will become:

// For slow search only
static const char * en_gb_singulars_idx[] = {
    "s_en_only",
    "s_translated"
    NULL // End mark
}

// The same as optimized
static const char * en_gb_singulars[] = {
  "english only", // 1=s_en_only
  "s translated", // 2=s_translated
  NULL, // 3="s_untranslated"
};

// The same thing, but with runtime index search
#define _(text) lv_i18n_get_singular_by_idx(text, get_singular_idx(text))

That will unify almost everything:

Fast version will use macro index
Slow version will contain extra index table + index search function.

And no more difference! That will also simplify code generator.

What do you think?

PS happy new year :)

bubeck commented 1 year ago

Very good idea!! Great!

Thanks! Happy new year to you as well. We still have 5 hours to go

puzrin commented 1 year ago

https://github.com/lvgl/lv_i18n/commits/tmp_template

IMO that's ready to go. I'm not an expert in C, and some minor polish is probably needed. But the data layout should be ok.

bubeck commented 1 year ago

I think, you are "off by 1": https://github.com/lvgl/lv_i18n/blob/39a97d43a38170eb341b7909de4b2ff2f906369a/src/lv_i18n.template.new.c#L116

Use txt = lang->singulars[msg_index]; (many times the same error).

Looks nice and simple! How should we proceed? Who is doing what?

puzrin commented 1 year ago

I think, you are "off by 1"

You are right. https://github.com/lvgl/lv_i18n/commit/39a97d43a38170eb341b7909de4b2ff2f906369a - I tried to fix indexing enumeration, but missed -1 correction removal.

How should we proceed? Who is doing what?

I can't spend time for C code, but can do JS if that's difficult for you.

IMHO next step - update C tests and make sure everything is right in new template.

bubeck commented 1 year ago

IMHO next step - update C tests and make sure everything is right in new template.

Will do.

bubeck commented 1 year ago

Done. Please find new version in https://github.com/lvgl/lv_i18n/pull/55/commits/a8958eeca6bf48c18f276e449055bd58730e9ddf

 $ make c
gcc -std=c99 -pedantic -Wall -Wextra -Wconversion -Werror -Wno-switch-enum -Wno-double-promotion -Wbad-function-cast -Wcast-qual -Wold-style-definition -Wshadow -Wstrict-overflow -Wstrict-prototypes -Wswitch-default -Wundef   -I combined-sample-2 -I unity/src -I build unity/src/unity.c combined-sample-2/lv_i18n.c test.c -o build/test.exe
./build/test.exe
test.c:249:test_init_should_ignore_NULL_input:PASS
test.c:250:test_init_should_ignore_empty_language_pack:PASS
test.c:251:test_init_should_work:PASS
test.c:254:test_set_locale_should_fail_before_init:PASS
test.c:255:test_set_locale_should_work:PASS
test.c:256:test_set_locale_should_fail_not_existing:PASS
test.c:259:test_get_text_should_work:PASS
test.c:260:test_get_text_should_fallback_to_base:PASS
test.c:261:test_get_text_should_fallback_to_orig:PASS
test.c:264:test_get_text_plural_should_work:PASS
test.c:265:test_get_text_plural_should_fallback_to_base:PASS
test.c:266:test_get_text_plural_should_fallback_to_orig:PASS
test.c:269:test_should_fallback_without_langpack:PASS
test.c:270:test_empty_base_tables_fallback:PASS
test.c:271:test_empty_plurals_fallback:PASS
test.c:272:test_empty_content_check:PASS

-----------------------
16 Tests 0 Failures 0 Ignored
OK
gcc -std=c99 -pedantic -Wall -Wextra -Wconversion -Werror -Wno-switch-enum -Wno-double-promotion -Wbad-function-cast -Wcast-qual -Wold-style-definition -Wshadow -Wstrict-overflow -Wstrict-prototypes -Wswitch-default -Wundef  -DLV_I18N_OPTIMIZE  -I combined-sample-2 -I unity/src -I build unity/src/unity.c combined-sample-2/lv_i18n.c test.c -o build/test.exe
./build/test.exe
test.c:249:test_init_should_ignore_NULL_input:PASS
test.c:250:test_init_should_ignore_empty_language_pack:PASS
test.c:251:test_init_should_work:PASS
test.c:254:test_set_locale_should_fail_before_init:PASS
test.c:255:test_set_locale_should_work:PASS
test.c:256:test_set_locale_should_fail_not_existing:PASS
test.c:259:test_get_text_should_work:PASS
test.c:260:test_get_text_should_fallback_to_base:PASS
test.c:261:test_get_text_should_fallback_to_orig:PASS
test.c:264:test_get_text_plural_should_work:PASS
test.c:265:test_get_text_plural_should_fallback_to_base:PASS
test.c:266:test_get_text_plural_should_fallback_to_orig:PASS
test.c:269:test_should_fallback_without_langpack:PASS
test.c:270:test_empty_base_tables_fallback:PASS
test.c:271:test_empty_plurals_fallback:PASS
test.c:272:test_empty_content_check:PASS

-----------------------
16 Tests 0 Failures 0 Ignored
OK

If everything is OK, I would be glad, if you can take over JS.

lvgl / lv_i18n