Closed lidel closed 3 years ago
Couldn't extract fully, some files fail with other os error
described in https://github.com/dignifiedquire/zim/issues/3:
$ extract_zim --skip-link wikipedia_tr_all_maxi_2019-10.zim --out distributed-wikipedia-mirror/out2
Mon 28 Oct 12:26:11 CET 2019
Extracting file: wikipedia_tr_all_maxi_2019-10.zim to distributed-wikipedia-mirror/out2
Creating map
Extracting entries: 4808
Spawning 4808 threads
couldn't create distributed-wikipedia-mirror/out2/A/Eternity: other os error
couldn't create distributed-wikipedia-mirror/out2/A/Eternity: other os error
couldn't create distributed-wikipedia-mirror/out2/A/Eternity: other os error
thread '<unnamed>' panicked at 'failed retry: couldn't create distributed-wikipedia-mirror/out2/A/Eternity: other os error', extract_zim.rs:133:17
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
couldn't create distributed-wikipedia-mirror/out2/A/Karacaoğlan: other os error
couldn't create distributed-wikipedia-mirror/out2/A/Karacaoğlan: other os error
couldn't create distributed-wikipedia-mirror/out2/A/Karacaoğlan: other os error
thread '<unnamed>' panicked at 'failed retry: couldn't create distributed-wikipedia-mirror/out2/A/Karacaoğlan: other os error', extract_zim.rs:133:17
couldn't create distributed-wikipedia-mirror/out2/A/Dört/Mazi_Kalbimde: other os error
couldn't create distributed-wikipedia-mirror/out2/A/Dört/Mazi_Kalbimde: other os error
couldn't create distributed-wikipedia-mirror/out2/A/Dört/Mazi_Kalbimde: other os error
thread '<unnamed>' panicked at 'failed retry: couldn't create distributed-wikipedia-mirror/out2/A/Dört/Mazi_Kalbimde: other os error', extract_zim.rs:133:17
couldn't create distributed-wikipedia-mirror/out2/A/HIV: other os error
couldn't create distributed-wikipedia-mirror/out2/A/HIV: other os error
couldn't create distributed-wikipedia-mirror/out2/A/HIV: other os error
thread '<unnamed>' panicked at 'failed retry: couldn't create distributed-wikipedia-mirror/out2/A/HIV: other os error', extract_zim.rs:133:17
couldn't create distributed-wikipedia-mirror/out2/A/Fredrikstad/Sarpsborg: other os error
couldn't create distributed-wikipedia-mirror/out2/A/Fredrikstad/Sarpsborg: other os error
couldn't create distributed-wikipedia-mirror/out2/A/Fredrikstad/Sarpsborg: other os error
thread '<unnamed>' panicked at 'failed retry: couldn't create distributed-wikipedia-mirror/out2/A/Fredrikstad/Sarpsborg: other os error', extract_zim.rs:133:17
...
Fixing this would be the first step to unblock this.
I want to make a side remark here. It is a pity that openZIM seems to not to provide an official tool you can just use to extract the ZIM content. I'm not sure about your exact requirement, but If I get them, I will seriously consider to do something to fix that problem for you. I want to really encourage you to create a feature request here https://github.com/openzim/zim-tools
@kelson42 thank you for bringing zim-tools to my attention!
I was not around when tweaked extract_zim
was created, but been told it was created either because zimdump
was simply not around yet and original one was missing some features.
I'll run some tests with zimdump
from zim-tools and report back.
Update: I think we could switch, but some things need to be fixed first. See https://github.com/ipfs/distributed-wikipedia-mirror/issues/66 :)
this error should be fixed now in the latest version of dignifiedquire/zim
Just ran the extraction, all fixed now
$time ./target/release/extract_zim --skip-link ~/Downloads/wikipedia_tr_all_maxi_2019-10.zim --out ./out
Extracting file: /Users/dignifiedquire/Downloads/wikipedia_tr_all_maxi_2019-10.zim to ./out
Creating map
Extracting entries: 4808
Spawning 4808 tasks across 16 threads
Extraction done in 47453ms
Main page is Kullanıcı:The_other_Kiwix_guy/Landing
./target/release/extract_zim --skip-link --out ./out 81.81s user 218.73s system 631% cpu 47.560 total
Thank you @dignifiedquire, this is great!
I took it for a spin and initial results are pretty good (wip in https://github.com/ipfs/distributed-wikipedia-mirror/pull/67):
wikipedia_tr_all_maxi_2019-10.zim
took less than two minutes.--offline
mode took under 10 minutes.Next step is to figure out #64 and landing page for execute-changes.sh
@kelson42 do you know why .zim
file states that the Main page is Kullanıcı:The_other_Kiwix_guy/Landing?
Is providing custom page a new convention in kiwix project?
Original landing page at ./out/A/Anasayfa.html
seems to be truncated, the page includes only "article of the week" section, making it pretty bad landing page overall.
Tried to finish snapshot creation but scripts no longer work.
JS and the directory structure changed so much, that entire execute-changes.sh
needs to be redone.
I also noticed ./out/-/j/js_modules/jsConfigVars.js
is invalid, its contents being: (
(the single character) file has the same contents when unpacked with zimdump
, so its not a bug in extract_zim
Retested with wikipedia_tr_all_maxi_2019-12.zim
, same results
Update: I created a bounty for remaining work: https://github.com/ipfs/distributed-wikipedia-mirror/issues/64
It has a pretty nice footer with useful information about mirror and its sources.
Two cosmetic issues remain:
/wiki/Anasayfa.html
is fetched for the date of unpacking ZIM, instead of the day ZIM was created (This seems to be specific to Turkish wiki)/wiki/Portal:Matematik.html
etc)
Both can be fixed manually by patching HTML in /wiki/Anasayfa.html
, but if anyone has time to fix them programmatically, that would be useful for other languages.
The view on mobile isn’t great - is that expected or a regression?
@momack2 Yes, if we want "original" landing page this is something we need to fix manually (I believe the one in old snapshot was also crafted by hand).
Context:
Original Main page is truncated or not included in many ZIM, so we have no "mobile friendly" version. We download original HTML from wikipedia itself and add it to the unpacked ZIM snapshot, which as seen above requires some work.
FYI ZIM files often use a custom page provided by a contributor that makes more sense for offline use (example). You can see it does not have "topic of the day", instead its a dry list of wide topics ready to explore.
If we keep it, very little or no manual fixes may be needed because it is already simple enough to be mobile friendly – see this build where I left the original landing page from ZIM archive: https://bafybeieoya74422ovlmx23i5bxpuw2szsdrhsjwenfxkqoknw34jigcoua.ipfs.dweb.link/wiki/Anasayfa.html (added only the landing page, links may not resolve)
Oh yeah - much better. Not perfect, but "good enough" for smooth browsing.
Ok friends, I've picked up the ball in #77 and produced a brand new snapshot from wikipedia_tr_all_maxi_2021-02.zim
. If this goes well we will do the same for English (#61)
Highlights:
.html
, so paths look exactly the same as on original Wikipedia :exploding_head: :sparkles: /wiki/
uses "main page" from ZIM (date-agnostic, usually better suited for offline/exploratory use)/wiki/Anasayfa
(matching https://tr.wikipedia.org/wiki/Anasayfa):point_right: take it for a spin and comment if you find any issues:
"Kiwix" in place of "kiwix" in the footer would be better. With a link to https://kiwix.org, even better :)
@kelson42 added this in https://github.com/ipfs/distributed-wikipedia-mirror/pull/82 and created updated version:
@mburns mind switching https://tr.dev.wikipedia-on-ipfs.org to the above CID?
done. :)
Ok, this should be good enough for now.
DNSLink is updated and https://tr.wikipedia-on-ipfs.org now points at:
bafybeieuutdavvf55sh3jktq2dpi2hkle6dtmebe7uklod3ramihyf3xa4
(generated from wikipedia_tr_all_maxi_2021-02
)
I'm now shifting focus to English one, tracking that in #61
other os error
(https://github.com/dignifiedquire/zim/issues/3)extract_zim
v0.2.0wikipedia_tr_all_maxi_2019-10.zim
execute-changes.sh
execute-changes.sh
This could be done manually or as a part of #58