koreader / crengine

This is the KOReader CREngine fork. It cross-pollinates with the official CoolReader repository at https://github.com/buggins/coolreader, in case you were looking for that one.
70 stars 45 forks source link

lvtinydom: fix memory leaks #533

Closed benoit-pierre closed 10 months ago

benoit-pierre commented 11 months ago

Between 128 and ~2M when doing a full load on 23 (out of ~1K) of the files I tested.

Examples:


This change is Reviewable

poire-z commented 11 months ago

Have you managed to understand what persist() does ? and all the whole business about NT_ELEMENT vs NT_PELEMENT, _elemStorage (which must not yet be storage to disk) and the whole lifecycle of all that?

This bits of code are stuff I added, so it's quite possible I didn't think about that - there's not many ->persist() calls in the codebase, and many are commented out (meaning they probably were causing crashes :). Only place that does it for newly created nodes is moveItemsTo() - which was code from before I jumped in, used with autoBoxing nodes - so it does not look too odd. Can't really say if it's the right thing to do or not. If you're sure, fine with me.

benoit-pierre commented 11 months ago

Have you managed to understand what persist() does ? and all the whole business about NT_ELEMENT vs NT_PELEMENT, _elemStorage (which must not yet be storage to disk) and the whole lifecycle of all that?

FWIU, modify() switch to a temporary (less optimized) modifiable state, and persist() is the reverse, read-only, state.

benoit-pierre commented 10 months ago

Have you just fixed/added that for the ones that were reported by your analyzing tools ?

Fixing them as I find them. ¯\(ツ)

But again, I tested 1316 EPUBs, including the ones in test/epub2-conform/.

poire-z commented 10 months ago

But again, I tested 1316 EPUBs

Sure they contain enough ruby and "bad" documents (with incomplete tables) so all these codepaths are taken and checked? In case you need some samples, here are my *table* and *ruby* test files: test-table-ruby.zip (don't worry if they look bad or fail, it's just random fuzzing input for your tools if you need some more :)

benoit-pierre commented 10 months ago

But again, I tested 1316 EPUBs

Sure they contain enough ruby and "bad" documents (with incomplete tables) so all these codepaths are taken and checked?

No idea (TODO: add coverage to test script attached below).

In case you need some samples, here are my *table* and *ruby* test files: test-table-ruby.zip (don't worry if they look bad or fail, it's just random fuzzing input for your tools if you need some more :)

No leaks.

Here is what I use for testing: leaktest.zip. To be decompressed/run from the source tree.

There are a number of (mostly exclusive) switch. Use -a to use libasan (e.g. ./leaktest.sh -a test/*.{djvu,epub,pdf,txt}).

poire-z commented 10 months ago

No leaks.

Ok. So do you feel like adding persists() to all others ? Or not, to be on the safest side ? If not, may be add a comment above your first persist() in a function that // This could/should probably be done to others around here or something like that (otherwise, when reredeaing it in 5 years, I'll wonder if it is normal that it is done there but not done here :/ and I won't dare adding them, cause if they're not there, there must be a reason :)

Here is what I use for testing

I don't have all the tools to make use of if - so trusting you with all this :) One thing you could add after the initial open & render(), is to trigger some setting change and have the document fully re-rendered: they are different codepaths taken whether intiial load (ie. styles computer while the dom is built) and later rerender (styles recomputed on an existing DOM).

benoit-pierre commented 10 months ago

Here is what I use for testing

I don't have all the tools to make use of if - so trusting you with all this :)

You don't need all the tools, just libasan.so, which should be provided by GCC. I've amended the main shell script to not hard-code the library path , and add support for coverage with kcov: leaktest.zip.

Coverage results for lvtinydom.cpp on the aforementioned ~1K EPUBs: 46.8%…

One thing you could add after the initial open & render(), is to trigger some setting change and have the document fully re-rendered: they are different codepaths taken whether intiial load (ie. styles computer while the dom is built) and later rerender (styles recomputed on an existing DOM).

Any suggestions?

poire-z commented 10 months ago

Any suggestions?

Something similar to that (taking the Harfbuzz paths you might not have taken? Although I dunno if this could add some noise/variations to your tests, if some optimisation makes the FreeType path faster but the Harfbuzz path slower, you may not notice it).

--- a/frontend/apps/reader/readerui.lua
+++ b/frontend/apps/reader/readerui.lua
@@ -312,8 +312,13 @@ function ReaderUI:init()
             self:handleEvent(Event:new("PreRenderDocument", self.doc_settings))

             start_time = time.now()
+            self.document:setFontKerning(1) -- Freetype only
             self.document:render()
             logger.dbg(string.format("  rendering took %.3f seconds", time.to_s(time.since(start_time))))
+            start_time = time.now()
+            self.document:setFontKerning(3) -- Harfbuzz
+            self.document:render()
+            logger.dbg(string.format("  rerendering took %.3f seconds", time.to_s(time.since(start_time))))
Frenzie commented 10 months ago

On a related thought, do any of those test documents have footnotes?

poire-z commented 10 months ago

On a related thought, do any of those test documents have footnotes?

As the test script doesn't provice any CSS snippet to enable them, I guess the documents are rendered without. (It feels it doesn't really matter, the footnote code is quite localized and comes late in the party, but who knows if I have related memory leaks in my PageSplitState2 :)

I'm also not sure if the test script gets crengine to use epub.css or no default stylesheet at all - but I guess it's ok as most EPUBs have their own stylesheets. Also dunno about hyphenation if it is active by default.

benoit-pierre commented 10 months ago

OK, here's a version that goes through readerui:

G_defaults = require("luadefaults"):open()
local DataStorage = require("datastorage")
G_reader_settings = require("luasettings"):open(DataStorage:getDataDir().."/settings.reader.lua")

local einkfb = require("ffi/framebuffer") --luacheck: ignore
einkfb.dummy = true --luacheck: ignore

local Device = require("device")

local Screen = Device.screen
Screen:init()

local CanvasContext = require("document/canvascontext")
CanvasContext:init(Device)

local Input = Device.input
Input.dummy = true

local ffiutil = require("ffi/util")
local cachedir = "/tmp/cr3cache"
local cre = require('document/credocument'):engineInit()

local DocumentRegistry = require("document/documentregistry")
local ReaderUI = require("apps/reader/readerui")
local UIManager = require("ui/uimanager")

ffiutil.purgeDir(cachedir)
cre.initCache(cachedir, 0, true, 40)

function fastforward_ui_events()
    -- Fast forward all scheduled tasks.
    UIManager:shiftScheduledTasksBy(-1e9)
    UIManager:run()
end

for i, b in ipairs(arg) do
    print(string.format("%d/%d %s", i, #arg, b))
    io.stdout:flush()
    print(b)
    doc = DocumentRegistry:openDocument(b)
    if doc then
        local readerui = ReaderUI:new{
            dimen = Screen:getSize(),
            document = doc,
        }
        UIManager:quit()
        UIManager:show(readerui)
        UIManager:scheduleIn(1, function()
            UIManager:close(readerui)
            ReaderUI.instance = readerui
        end)
        fastforward_ui_events()
        readerui:closeDocument()
        readerui:onClose()
        readerui = nil
        UIManager:quit()
        UIManager._exit_code = nil
    end
end

Device:exit()

No I need to figure a way to trigger a re-render.

benoit-pierre commented 10 months ago

As for the presence of footnotes, yes, this one for example has plenty of footnotes.

Frenzie commented 10 months ago

In that case they merely need to be triggered. :)

benoit-pierre commented 10 months ago

Meaning? The footnotes? (The readerui version does call CreDocument:setStyleSheet).

Frenzie commented 10 months ago

I think we use a styletweak enabled by default for fully standard footnotes (with another one I personally also enable by default that catches many pre-standard footnotes). Without looking at the code or the result I wouldn't really know what's going on. :-)

poire-z commented 10 months ago

Ok. So do you feel like adding persists() to all others ? Or not, to be on the safest side ? If not, may be add a comment above your first persist() in a function that // This could/should probably be done to others around here or something like that (otherwise, when reredeaing it in 5 years, I'll wonder if it is normal that it is done there but not done here :/ and I won't dare adding them, cause if they're not there, there must be a reason :)

Answer & thoughts?

benoit-pierre commented 10 months ago

After more experimenting with my test code, it looks like the leaks are also stylesheet specific: without the patch, I get leaks with the default crengine stylesheet, but not Koreader custom one, except for your test-table-ruby.zip sample.

Anyway, I would keep the PR as is, and not add more persist calls without a good reason.