Closed parvit closed 1 year ago
As explained in https://github.com/openzim/sotoki/issues/243#issuecomment-1194314604 I believe this is not needed as it is possible to disable it. Please confirm and we'll close this PR.
Checked ; works as expected.
Very well, just know however that if you pass the program with memray you'll still see the memory cost associated even when disabling it (because the defaults of the library still have effect).
Very well, just know however that if you pass the program with memray you'll still see the memory cost associated even when disabling it (because the defaults of the library still have effect).
Compression and indexing are both handled by libzim and only take action after the call to start()
. Defaults only call those config_*
methods which can be called several times before start.
As stated above, I've tested that it works as expected: by calling .config_compression
and .config_indexing
to disable both I end up with an uncompressed ZIM that does not include the full-text index.
yes i did not get that you would try by calling the methods directly.
the point i wanted to convey is that without explicitly invoking the disabling of both options you will get the cost, it is not enough to just not calling it at all.
Il Mer 27 Lug 2022, 12:02 rgaudin @.***> ha scritto:
Very well, just know however that if you pass the program with memray you'll still see the memory cost associated even when disabling it (because the defaults of the library still have effect).
Compression and indexing are both handled by libzim http:///openzim/python-libzim and only take action after the call to start(). Defaults only call those config_* methods which can be called several times before start.
As stated above, I've tested that it works as expected: by calling .config_compression and .config_indexing to disable both I end up with an uncompressed ZIM that does not include the full-text index.
— Reply to this email directly, view it on GitHub https://github.com/openzim/python-scraperlib/pull/88#issuecomment-1196525114, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWGJD4ABMUFLB5VHV5CSDXDVWECLLANCNFSM54S2GHIA . You are receiving this because you authored the thread.Message ID: @.***>
That's right. As @kelson42 said somewhere, this is the wanted behavior for 99.9% or our users. Compression and indexing should not have a significant impact on memory and if it does it may be a bug in libzim. I'll get to testing sotoki without both in the coming days so we can have a way forward.
I think the key here is that zimscraperlib.zim.creator.Creator
inherits from libzim.writer.Creator
so API might seem smaller than it actually is.
This PR responds to issue openzim/sotoki/issues/243.
Disables the full text indexing and compression by default so that it's memory cost is only payed if requested (which can be an issue with big sites).