tlwg / libthai

GNU Lesser General Public License v2.1
71 stars 18 forks source link

Lots of th_brk_new() calls #8

Open Isopod opened 5 years ago

Isopod commented 5 years ago

Hi, I am currently writing a program using Pango, which uses libthai internally. While debugging my own program, I noticed that a lot of time is spent in the function trie_new_from_file(), which is called from th_brk_new(). This function is called every time a Pango layout is rendered, which leads to a significant slowdown.

I think the reason is here: https://github.com/tlwg/libthai/blob/master/src/thbrk/thbrk.c#L342

The variable is_tried is checked, but it is never getting set. Is it possible that someone forgot to add an is_tried = true;?

thep commented 5 years ago

And the check ... && !is_tried is in fact redundant, as is_tried is always zero. So, the variable is irrelevant and can be dropped. I think it's an artifact of some old code being copied around.

And the non-null check on brk_shared_brk should be sufficient here.

So we should check how Pango calls libthai.

thep commented 5 years ago

IIRC, the is_tried flag was once used for preventing attempts to load the shared engine when it is known to have failed for certain times. Then the check has been dropped but the variable is still left there. The logic is not relevant to the problem being discussed.

thep commented 5 years ago

Regarding Pango, we can see the history here:

https://gitlab.gnome.org/GNOME/pango/commits/master/pango/break-thai.c

This commit introduced the use of LibThai's thread-safe API, where the default dictionary is loaded everytime break_thai() is called:

https://gitlab.gnome.org/GNOME/pango/commit/63e1fc9b23b4f0e8eaacd151ad39403eaa2b48ef

It is good for thread safety, at the cost of performance problem. This is the behavior of Pango since 1.42.2.

Then, we have another commit where the shared version of th_brk_find_breaks() is called:

https://gitlab.gnome.org/GNOME/pango/commit/cd943d8b880e702a90575be363b79f09144e75cb

While this can improve the performance, it is equivalent to calling th_brk() without locking! So, the thread clashing issue should come back again after this commit!

This is the behavior of Pango 1.43.x (supposedly devlopment branch).

I guess you are using 1.42.2+ from the stable branch, which should be thread-safe, but with performance problem as you reported. However, the mitigation in 1.43.x is still dangerous. We should try to find a better solution.

Isopod commented 5 years ago

Thanks for your response. I think I was indeed using Pango 1.42.4 when I opened this issue.

I think what happened was that the call to th_brk_new() failed and returned null, so when the result was passed to th_brk_find_breaks(), it defaulted to using the shared brk. So in the profiler I saw a bunch of th_brk_new () and brk_get_shared_brk() calls. Since the latter also calls the former, I assumed it was the cause of the problem. But I see now that the "is_tried" variable wouldn't help here.

However, I think there should be a mechanism so that when initialization failed once it doesn't try again and again every time some text is drawn on the screen. Perhaps instead of returning null when th_brk_new() fails, you could set a flag in the returned struct to indicate that loading failed. Or don't load the data in the constructor, but have a second function just for that.

epico commented 3 years ago

@thep Do you think the following code will make the code thread-safe?

  G_LOCK (th_brk);
  len = th_brk_find_breaks(NULL, tis_text, brk_pnts, cnt);
  G_UNLOCK (th_brk);