Closed tim-moody closed 3 years ago
@tim-moody
using version 2.1.0
Try git version.
using 2.2.0 I got remarkably similar results:
Assertion failed at ../../SOURCE/libzim_release/src/writer/cluster.cpp:266
data.value.empty()[1] == false[0]
[0x41ef36]
[0x43d7a5]
[0x43d8a7]
[0x43df68]
[0x44060b]
[0x81f555]
[0x98f219]
terminate called after throwing an instance of 'std::runtime_error'
what():
Assertion failed at ../../SOURCE/libzim_release/src/writer/cluster.cpp:266
data.value.empty()[1] == false[0]
Aborted (core dumped)
built with
#!/bin/bash -x
# download zim-tools, libzim, compile tools, and place in $PATH
PREFIX=/opt/iiab
cd $PREFIX
if [ ! -d "$PREFIX/zim-tools" ];then
git clone https://github.com/openzim/zim-tools
fi
if [ ! -d "$PREFIX/libzim" ];then
git clone https://github.com/openzim/libzim
fi
apt install -y libzstd-dev
apt install -y libdocopt-dev
apt install -y libgumbo-dev
apt install -y libmagic-dev
apt install -y liblzma-dev
apt install -y libxapian-dev
apt install -y libicu-dev
apt install -y docopt-dev
apt install -y ninja
apt install -y meson
apt install -y cmake
apt install -y pkgconf
cd $PREFIX/libzim
meson . build
ninja -C build
if [ $? -ne 0 ];then
echo Build of libzim failed. Quitting . . .
exit 1
fi
ninja -C build install
ldconfig
cd $PREFIX/zim-tools
meson . build
ninja -C build
if [ $? -ne 0 ];then
echo Build of zim-tools failed. Quitting . . .
exit 1
fi
rsync -a $PREFIX/zim-tools/build/src/zim* /usr/local/sbin/
@tim-moody
Assertion failed at ../../SOURCE/libzim_release/src/writer/cluster.cpp:266
This assert isn't from libzim-git. It's 6.3.0 version.
I cloned git clone https://github.com/openzim/libzim. Is that what you meant by libzim-git?
It's true the Changelog has only 6.3.0. The libzim.so produced is /usr/local/lib/x86_64-linux-gnu/libzim.so.7.0.0.
Please tell me how to proceed.
Why don't you just build using kiwix-build?
kiwix-build --target-platform native_static zim-tools
It will download and build all dependencies on its own.
@tim-moody
what you meant by libzim-git?
master branch
The libzim.so produced is /usr/local/lib/x86_64-linux-gnu/libzim.so.7.0.0.
It's correct.
Are you sure you don't have libzim
in /usr/lib
?
Yes. I found /usr/lib/x86_64-linux-gnu/libzim.so.6 -> libzim.so.6.2.2 and removed it, though it is my impression that zimdiff links explicitly with libzim.so.7. I ran again and got
Assertion failed at ../../SOURCE/libzim_release/src/writer/cluster.cpp:266
data.value.empty()[1] == false[0]
[0x41ef36]
[0x43d7a5]
[0x43d8a7]
[0x43df68]
[0x44060b]
[0x81f555]
[0x98f219]
terminate called after throwing an instance of 'std::runtime_error'
what():
Assertion failed at ../../SOURCE/libzim_release/src/writer/cluster.cpp:266
data.value.empty()[1] == false[0]
Aborted (core dumped)
Oh, again.
cluster.cpp:266
It's old source.
Are you sure you don't have zimdiff
in /usr/bin?
That was it. It wasn't the .so but zimdiff itself. 'which' showed the new one, but executing ran the old one.
Now ran the new one and went to completion. thanks.
So back to my original question, is the process to run
zimdiff zim1 zim2 zim3
and then
zimpatch zim1 zim3 zim4
making zim4 = zim1 plus all changes and additions of zim2?
Yes.
And check zimpatch
in /usr/bin
, please. :)
@tim-moody Why you believe that zimdiff/zimpatch are tools to merge ZIM files? This is false. For the rest your bug report seems legit. These tools almost not used, but I would be happy to consider fixint it if you share your ZIM files.
@kelson42 The zim files I wish to merge are those produced by
https://farm.openzim.org/recipes/wikipedia_en_medicine
and
https://farm.openzim.org/recipes/mdwiki
But there are others that a user might wish to merge such as
http://download.kiwix.org/zim/wikipedia/wikipedia_en_chemistry_maxi_2021-04.zim
and
http://download.kiwix.org/zim/wikipedia/wikipedia_en_physics_maxi_2021-03.zim
(or other combinations such as baseball, basketball, football)
Their usage texts led me to believe zimdiff/zimpatch might do this, which is why I asked the question. But all I really wanted to know was what is the proper way to merge two zims.
This tool does not exist and even if it would (relatively easy to build), the result woukd probably not be what you expect because none of the HTML articles wiuld be updated to benefit od fully of the merge.
What I would expect is that the result would be a 3rd zim that would contain all articles in one or both of the source zims and that for articles that are in both, the one from one of the zims, in my case mdwiki, would take precedence and the same article in the other would be ignored. This is exactly what mdwiki is supposed to do through mirroring, but using a merge would be a more efficient means of producing the same result.
So back to my original question, is the process to run
zimdiff zim1 zim2 zim3
and thenzimpatch zim1 zim3 zim4
making zim4 = zim1 plus all changes and additions of zim2?
No, the process is :
zimdiff base_zim target_zim diff_zim
(Generate diff_zim)
then
zimpatch base_zim diff_zim final_zim
(Generate final_zim)
making final_zim == target_zim.
(This is a same semantic than diff/patch).
@mgautierfr thanks for clarifying. I think I now understand that zimdiff/patch works like the equivalent for files or repos such that if I have base I only need to obtain diff in order to generate target.
Yes.
But please be careful with those tools. They have been created a long time ago (for a gsoc I think) and they never have been tested correctly.
And one think that we know for sure, the final_zim
is not totally equal to target_zim
. They should contain the same content but they are NOT binary equal.
This tool does not exist and even if it would (relatively easy to build)
New tool zimjoin
?
I would discuss the use case before creating another tool. It will avoid us to have unused tools as zimdiff/zimpatch.
Zim files are not so easy to merge and it make not much sense as zim files are by definition independent (no article link from a zim file to another).
The zim files I wish to merge are those produced by
https://farm.openzim.org/recipes/wikipedia_en_medicine
and
I guess we'll wait to see if the problem can be solved at source (mdwiki.org) and if not then who is willing to do something about it.
Given zim1 and zim2, where zim2 contains articles not in zim1 and articles changed from z1, I ran (using version 2.1.0)
zimdiff zim1 zim2 zim3 which yielded
I was expecting zim3 to be a file suitable for use with zimpatch.