Closed georgejhunt closed 3 years ago
An important point: running zimwriterfs on a zimdumped out directory does not recreate the original ZIM file. We need to analyse if this is a bug.
As @kelson42 said, zimwriterfs is not the inverse operation of zimdump.
As your articles are in the A
sub-directory and are placed in C
namespace the full path is C/A/foo.html
.
In most recent version of libzim/kiwix-lib/kiwix-tools (master), the namespace is hidden and so the url is /A/foo.html
.
However, it seems that the compatibility layer fails to locate /A/foo.html
. It is probably a bug. Is it possible to share the generated zim file somewhere ?
My original task was to reduce a 27GB zim file to about 10GB. So when I found that the 10 GB zim didn't work, I went to use zimwriterfs on the zimdump-ed 27GB unmodified tree. So it is taking a while to upload. It may take most of today, and may fail. I'll send you a link to my s3 space when it finishes uploading.
But thank you for looking into it.
On Mon, Mar 8, 2021 at 2:10 AM Matthieu Gautier notifications@github.com wrote:
As @kelson42 https://github.com/kelson42 said, zimwriterfs is not the inverse operation of zimdump.
As your articles are in the A sub-directory and are placed in C namespace the full path is C/A/foo.html. In most recent version of libzim/kiwix-lib/kiwix-tools (master), the namespace is hidden and so the url is /A/foo.html.
However, it seems that the compatibility layer fails to locate /A/foo.html. It is probably a bug. Is it possible to share the generated zim file somewhere ?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/openzim/zim-tools/issues/230#issuecomment-792643736, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOTQHBL7JVVI5ORHANSCX3TCSO7RANCNFSM4YXEODAQ .
@georgejhunt You better create a new quality profile in youtube scraper.
I was not scraping youtube. I was using a kiwix zim file as source. And then I was using youtube "view_count" to selectively copy from input to output (and trim from -/assets/data.js), and repackage. But certainly, if I were scraping youtube, I'd need to set a profile that minimized download size.
On Mon, Mar 8, 2021 at 9:15 AM Kelson notifications@github.com wrote:
@georgejhunt https://github.com/georgejhunt You better create a new quality profile in youtube scraper.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openzim/zim-tools/issues/230#issuecomment-792914557, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOTQHFWR6R5VJCW2WRBN63TCUA3LANCNFSM4YXEODAQ .
@mgautierfr I have zimdumped and recreated a ZIM with:
$ zimwriterfs --favicon="I/favicon.jpg" --language="eng" --title="my title" --description="my descriptioon" --creator="ted" --publisher="kiwix" --welcome="A/home.html" out/ out.zim
WARNING: LZMA compression method is deprecated. Support for it will be dropped from libzim soon.
Unable to resolve symlink out/-/favicon: No such file or directory
Resolve redirect
set index
All these tools are created with latest dev git master HEAD version. The result is uploaded at: http://tmp.kiwix.org/teded_broken_suggestions.zim
For me, there is no suggestion at all. Seems definitly broken but for me not clear if this is the ZIM file of libkiwix.
Are you sure you have the just merged https://github.com/openzim/zim-tools/pull/212 in zim-tools ?
@mgautierfr Will check, but I believe not.
The upload of 27GB failed twice. So I downloaded a shorter ZIM: ted_en_playlist-9-trippy-ted-talks_2021-01.zim. Then I used my zimdump, and zimwriterfs to create: http://d.iiab.io/content/trippy-en-tedtalks.zim -- which also exhibits the problem.
Libzim hash: commit ac2cc1fbe8d91b2da9df8c79a7469e83b7b1f30c -- Feb 24,2021 zimtools hash: commit f406219cd974d2a944cccbc72a0da8616d886972 -- also Feb 24.2021
Both compiled on Ubuntu 20.04 with no apparent problems after the dependencies were present
On Tue, Mar 9, 2021 at 5:54 AM Kelson @.***> wrote:
@mgautierfr https://github.com/mgautierfr Will check, but I believe not.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openzim/zim-tools/issues/230#issuecomment-793929044, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOTQHCKWJGUKRZ6WONH6Y3TCYSDFANCNFSM4YXEODAQ .
@mgautierfr I have secured now that I had the Hints patch... but exactly same symptom. I have updated http://tmp.kiwix.org/teded_broken_suggestions.zim
I've just tried with you trippy-en-tedtalks.zim
. I have few bugs (already fixed) but none corresponding to what you describe:
But clicking on the article link in the search page correctly move to the article (link is working).
@kelson42, @georgejhunt What reader are you using ? kiwix-serve, kiwix-desktop ? Which version ?
@mgautierfr Latest dev kiwix-serve
The kiwix-serve I was using is probably 6 months old. "kiwix-serve -V" yields 3.1.2
On Wed, Mar 10, 2021 at 2:21 AM Kelson @.***> wrote:
@mgautierfr https://github.com/mgautierfr Latest dev kiwix-serve
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openzim/zim-tools/issues/230#issuecomment-795204492, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOTQHHK7Z7CKAQB2QPGSZDTC5BZRANCNFSM4YXEODAQ .
@georgejhunt Just to confirm than the bug has been identified and this is not a minor one and even worth it went through the CI. So, really valuable bug report. Thx. A fix will be developed within a week.
Thanks for the update, and and priority
On Thu, Mar 11, 2021 at 12:57 AM Kelson @.***> wrote:
@georgejhunt https://github.com/georgejhunt Just to confirm than the bug has been identified and this is not a minor one and even worth it went through the CI. So, really valuable bug report. Thx. A fix will be developed within a week.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openzim/zim-tools/issues/230#issuecomment-796578327, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOTQHHPTK3IKMA5HWCM52LTDCAX3ANCNFSM4YXEODAQ .
I used zimdump (commit dca3a83d48a7ac5612c4f3dbfaba89c02c66e6b4 Merge: f406219 ff61a93 Author: Matthieu Gautier mgautier@kymeria.fr