koreader / crengine

This is the KOReader CREngine fork. It cross-pollinates with the official CoolReader repository at https://github.com/buggins/coolreader, in case you were looking for that one.
70 stars 45 forks source link

LVXMLParser::ReadText() Optimizations #567

Closed bbshelper closed 2 months ago

bbshelper commented 3 months ago

This part of code are not very testable, so I submit these small commits which I find easier to verify by eyes. Basically I removed unnecessary checks in the inner loop.

In my test on mid-size Project Gutenberg epubs, these commits can save ~2% total load & render time.

pct chg% epub
best -3.13 pg1204
25th -2.66 pg6884
50th -2.40 pg50816
75th -2.11 pg61089
worst -1.17 pg31015

This change is Reviewable

poire-z commented 3 months ago

Thanks for the little commits, easier to get into indeed. But still hard to get into some of them :) Given the amount of "Not sure what" I put into this code when I had to touch it, I'm not really confident and familiar with all of this, so I'm going to trust you.

Again, the only thing that matters in this XML process is that any new code does not change the result we get with the old code, so to not break KOReader highlights made before (which are indices into the text node, if we miss a char or add one, highlights will be shifted). It looks like this is ok.

I trust @benoit-pierre for the low level stuff like memcpy (I don't undetstand the Line 885 already assumes there is no NUL in str. :/)

I also let @benoit-pierre tell if this conflicts with some of his optimisations in his own branch - or if you're both doing the same kind of work, be sure it's not redundant and some brainwork is wasted :)

Question for my cultural benefit: Btw, I'm suprised this is allowed, jumping to a label in a "sub-scope", inside a if branch: image What will happen when we jump in there: the code assume the first if () was true, and all the subsequent else will be skipped ? What would happen if we defined local variables above the if we jumped inside - are these variables defined, usable ? Or would the compiler complain?

bbshelper commented 3 months ago

But still hard to get into some of them :)

Do let me know. I'll try to prove the equivalence.

the only thing that matters in this XML process is that any new code does not change the result we get with the old code

Understand. That's my intention too.

I don't undetstand the Line 885 already assumes there is no NUL in str. :/

If there is a NUL halfway in the str, _lStr_ncpy would stop there, but line 885 always adds count to string length, so there might be a mismatch. So switching to memcpy isn't likely to create new bugs.

I also let @benoit-pierre tell if this conflicts with some of his optimisations in his own branch - or if you're both doing the same kind of work, be sure it's not redundant and some brainwork is wasted :)

@benoit-pierre, would you share the optimisations you have or plan to do?

What will happen when we jump in there: the code assume the first if () was true, and all the subsequent else will be skipped ?

In this code, flgBreak is guaranteed to be true before the goto, so the if condition holds. And there is a break inside the if block, so subsequent else are not checked anyway.

Btw, I'm suprised this is allowed, jumping to a label in a "sub-scope", inside a if branch: What would happen if we defined local variables above the if we jumped inside - are these variables defined, usable ? Or would the compiler complain?

That's legal, though I've only used goto to jump out of complex control flows. Local variables are defined and usable, but they might be uninitialized without compiler warning:

int foo(int a) {
    if (a > 0)
        goto end;
    if (a > 0) {
        int b = 80;
end:    return b;
    }
    return a;
}

int main() {
    return foo(5);
}
poire-z commented 3 months ago

But still hard to get into some of them :)

Do let me know. I'll try to prove the equivalence.

No, that's fine, I trust you. (I meant hard to get into/understand the old code - and I don't really want to :) so it's good enough for me you do :))

Thanks for the other explanations.

benoit-pierre commented 3 months ago

Semi-related to gotos, check out Duff's device for something that changed my understanding of switch statements:

void send(short *to, short *from, int count)
{
    int n = (count + 7) / 8;
    switch (count % 8) {
    case 0: do { *to = *from++;
    case 7:      *to = *from++;
    case 6:      *to = *from++;
    case 5:      *to = *from++;
    case 4:      *to = *from++;
    case 3:      *to = *from++;
    case 2:      *to = *from++;
    case 1:      *to = *from++;
            } while (--n > 0);
    }
}
benoit-pierre commented 3 months ago

@benoit-pierre, would you share the optimisations you have or plan to do?

Here's the relevant range of commits.

FWIR, the one with the most impact is https://github.com/koreader/koreader/commit/98282b9d514540c155f98623e25a8939a57d4d02.

bbshelper commented 3 months ago

Here's the relevant range of commits.

That's impressive! I see something I've also done locally for lvtextfm, are you going to merge them any time soon?

benoit-pierre commented 3 months ago

Here's the relevant range of commits.

That's impressive! I see something I've also done locally for lvtextfm, are you going to merge them any time ~soon~?

Eventually. ;)

poire-z commented 2 months ago

@benoit-pierre : fine with this PR ?

I can merge it, but I'll propbably bump it into KOReader after next stable release, just for safety.

benoit-pierre commented 2 months ago

lString32::append is still potentially missing a null-terminator.

bbshelper commented 2 months ago

I can merge it, but I'll propbably bump it into KOReader after next stable release, just for safety.

I'll leave the first commit out. For safety I'll also review the remaining commits once again.

bbshelper commented 2 months ago

I'll leave the first commit out. For safety I'll also review the remaining commits once again.

Done. The remaining commits look good to me in self review.