ipfs / distributed-wikipedia-mirror

Putting Wikipedia Snapshots on IPFS
https://github.com/ipfs/distributed-wikipedia-mirror#readme
632 stars 54 forks source link

Deploying zh(Chinese) version of Wikipedia shows 'failed to parse input: OutOfBounds' #73

Closed FledgeXu closed 3 years ago

FledgeXu commented 3 years ago

I'm attempting to deploy zh(Chinese) version of Wikipedia and the script shows 'failed to parse input: OutOfBounds'OS version:

Linux localhost 4.19.0-5-amd64 #1 SMP Debian 4.19.37-5+deb10u2 (2019-08-08) x86_64 GNU/Linux

Rustc version:

rustc 1.49.0 (e1884a8e3 2020-12-29)

logs:

root@localhost:~/distributed-wikipedia-mirror# ./mirrorzim.sh --languagecode=zh --wikitype=wikipedia

Download the zim file...
base64: invalid input
--2021-01-23 22:01:06--  https://download.kiwix.org/zim/wikipedia/wikipedia_zh_all_maxi_2021-01.zim
Resolving download.kiwix.org (download.kiwix.org)... 195.154.156.115
Connecting to download.kiwix.org (download.kiwix.org)|195.154.156.115|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://ftpmirror.your.org/pub/kiwix/zim/wikipedia/wikipedia_zh_all_maxi_2021-01.zim [following]
--2021-01-23 22:01:06--  https://ftpmirror.your.org/pub/kiwix/zim/wikipedia/wikipedia_zh_all_maxi_2021-01.zim
Resolving ftpmirror.your.org (ftpmirror.your.org)... 204.9.55.82, 2001:4978:1:420::cc09:3752
Connecting to ftpmirror.your.org (ftpmirror.your.org)|204.9.55.82|:443... connected.
HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable

    The file is already fully retrieved; nothing to do.

Remove tmp directory ./tmp/wikipedia_zh_all_maxi_2021-01 before run ...
Unpack the zim file into ./tmp/wikipedia_zh_all_maxi_2021-01...
thread 'main' panicked at 'failed to parse input: OutOfBounds', src/bin/extract_zim.rs:56:36
stack backtrace:
   0:     0x55d36d1e8360 - std::backtrace_rs::backtrace::libunwind::trace::h04d12fdcddff82aa
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/../../backtrace/src/backtrace/libunwind.rs:100:5
   1:     0x55d36d1e8360 - std::backtrace_rs::backtrace::trace_unsynchronized::h1459b974b6fbe5e1
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x55d36d1e8360 - std::sys_common::backtrace::_print_fmt::h9b8396a669123d95
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys_common/backtrace.rs:67:5
   3:     0x55d36d1e8360 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::he009dcaaa75eed60
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys_common/backtrace.rs:46:22
   4:     0x55d36d209aec - core::fmt::write::h77b4746b0dea1dd3
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/core/src/fmt/mod.rs:1078:17
   5:     0x55d36d1e49f2 - std::io::Write::write_fmt::heb7e50902e98831c
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/io/mod.rs:1518:15
   6:     0x55d36d1ea965 - std::sys_common::backtrace::_print::h2d880c9e69a21be9
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys_common/backtrace.rs:49:5
   7:     0x55d36d1ea965 - std::sys_common::backtrace::print::h5f02b1bb49f36879
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys_common/backtrace.rs:36:9
   8:     0x55d36d1ea965 - std::panicking::default_hook::{{closure}}::h658e288a7a809b29
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:208:50
   9:     0x55d36d1ea608 - std::panicking::default_hook::hb52d73f0da9a4bb8
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:227:9
  10:     0x55d36d1eb101 - std::panicking::rust_panic_with_hook::hfe7e1c684e3e6462
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:593:17
  11:     0x55d36d1eac47 - std::panicking::begin_panic_handler::{{closure}}::h42939e004b32765c
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:499:13
  12:     0x55d36d1e881c - std::sys_common::backtrace::__rust_end_short_backtrace::h9d2070f7bf9fd56c
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys_common/backtrace.rs:141:18
  13:     0x55d36d1eaba9 - rust_begin_unwind
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:495:5
  14:     0x55d36d207a51 - core::panicking::panic_fmt::ha0bb065d9a260792
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/core/src/panicking.rs:92:14
  15:     0x55d36d207873 - core::option::expect_none_failed::h7e1dd0a94971eb61
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/core/src/option.rs:1268:5
  16:     0x55d36d0feef0 - extract_zim::main::h0d770a376a8e6eab
  17:     0x55d36d0f9bd3 - std::sys_common::backtrace::__rust_begin_short_backtrace::h5ecc56c6658a80dd
  18:     0x55d36d0fa599 - std::rt::lang_start::{{closure}}::hb0d654310eb3e6ce
  19:     0x55d36d1eb617 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h57e2a071d427b24c
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/core/src/ops/function.rs:259:13
  20:     0x55d36d1eb617 - std::panicking::try::do_call::h81cbbe0c3b30a28e
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:381:40
  21:     0x55d36d1eb617 - std::panicking::try::hbeeb95b4e1f0a876
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panicking.rs:345:19
  22:     0x55d36d1eb617 - std::panic::catch_unwind::h59c48ccb40a0bf20
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/panic.rs:396:14
  23:     0x55d36d1eb617 - std::rt::lang_start_internal::ha53ab63f88fee728
                               at /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/rt.rs:51:25
  24:     0x55d36d100ba2 - main
  25:     0x7f3fc854609b - __libc_start_main
  26:     0x55d36d0f80da - _start
  27:                0x0 - <unknown>
welcome[bot] commented 3 years ago

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

Finally, remember to use https://discuss.ipfs.io if you just need general support.

FledgeXu commented 3 years ago

I check out this https://github.com/ipfs/distributed-wikipedia-mirror/issues/66. It seems extract_zim can not extract the newest snapshots.

kelson42 commented 3 years ago

The problem is probably that it does not handle zstd compression introduced early 2020 in the ZIM format.

lidel commented 3 years ago

The fix requires switching to zimtools – https://github.com/ipfs/distributed-wikipedia-mirror/issues/66

lidel commented 3 years ago

@FledgeXu if you have time you can try again with updated README from https://github.com/ipfs/distributed-wikipedia-mirror/pull/77

Readable version: https://github.com/ipfs/distributed-wikipedia-mirror/blob/8a3c7d1cc5b2f0b787a76776d0ae27d33b911472/README.md#how-to-add-new-wikipedia-snapshots-to-ipfs

FledgeXu commented 3 years ago

@lidel Thanks, using the zimdump works for me now.

lidel commented 3 years ago

@FledgeXu if you want to give it a try and generate zh version, make sure you use updated scripts from #77 – the old ones won't work correctly with version produced by zimdump.

FledgeXu commented 3 years ago

Thanks, @lidel. I have done the test on my machine manual and I will try the updated scripts on the server. If I meet any bugs, I will report them under #77.