openzim / zim-tools

Various ZIM command line tools
https://download.openzim.org/release/zim-tools/
GNU General Public License v3.0
123 stars 34 forks source link

zimdump - invalid characters and long names cause error and incomplete extraction #373

Closed 2600box closed 11 months ago

2600box commented 11 months ago

I have this zim file that is 3GB: https://www.transfernow.net/dl/20231009UhHnE3Sy

❯ ./zimdump --version
zim-tools 3.2.0

libzim 8.2.1
+ libzstd 1.5.2
+ liblzma 5.2.6
+ libxapian 1.4.22
+ libicu 58.2.0

with zimdump list I can see there are 7011 items.

when I run zimdump dump it errors out and only partially extracts the contents.

The errors are caused by unneeded tracking stuff. Is there a way to prune these or ignore them?

The error looks like this:

Error writing file to errors dir. ./dump2/_exceptions/H%2fplay.google.com%2flog?format=json&hasfast=true&authuser=0&__wb_method=POST&[[1,null,null,null,null,null,null,null,null,null,[null,null,null,null,"en",null,"17",null,null,[1,0,0,0,0]]],1654,[["1696854400954",null,[],null,null,null,null,"[[[\"%2fclient_streamz%2fpo%2fw%2fel\",null,[\"en\",\"rk\"],[[[[\"c\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1]],[[[\"c\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,0]],[[[\"q\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,0.8000030517578125]],[[[\"S\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,3.5999984741210938]],[[[\"b\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,199.0999984741211]],[[[\"i\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1.5]],[[[\"r\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,3440.6000061035156]],[[[\"C\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,2.4000015258789062]],[[[\"x\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,0]],[[[\"m\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,3.100006103515625]]],null,[]],[\"%2fclient_streamz%2fpo%2fw%2frl\",null,[\"mn\",\"ac\",\"sc\",\"rk\"],[[[[\"c\"],[null,1],[null,0],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1887.900001525879]],[[[\"g\"],[null,1],[null,0],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1331.400001525879]]],null,[]],[\"%2fclient_streamz%2fpo%2fw%2fcsc\",null,[\"cs\",\"rk\"],[[[[null,3],[\"O43z0dpjhgX20SCx4KAo\"]],[1]]],null,[]]]]",null,null,null,null,null,null,0,[null,[],null,"[[],[],[],[]]"],null,null,null,[],1,null,null,null,null,null,[]]],"1696854400955",[]]
Exception: Error writing file to errors dir. ./dump2/_exceptions/H%2fplay.google.com%2flog?format=json&hasfast=true&authuser=0&__wb_method=POST&[[1,null,null,null,null,null,null,null,null,null,[null,null,null,null,"en",null,"17",null,null,[1,0,0,0,0]]],1654,[["1696854400954",null,[],null,null,null,null,"[[[\"%2fclient_streamz%2fpo%2fw%2fel\",null,[\"en\",\"rk\"],[[[[\"c\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1]],[[[\"c\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,0]],[[[\"q\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,0.8000030517578125]],[[[\"S\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,3.5999984741210938]],[[[\"b\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,199.0999984741211]],[[[\"i\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1.5]],[[[\"r\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,3440.6000061035156]],[[[\"C\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,2.4000015258789062]],[[[\"x\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,0]],[[[\"m\"],[\"O43z0dpjhgX20SCx4KAo\"]],[null,3.100006103515625]]],null,[]],[\"%2fclient_streamz%2fpo%2fw%2frl\",null,[\"mn\",\"ac\",\"sc\",\"rk\"],[[[[\"c\"],[null,1],[null,0],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1887.900001525879]],[[[\"g\"],[null,1],[null,0],[\"O43z0dpjhgX20SCx4KAo\"]],[null,1331.400001525879]]],null,[]],[\"%2fclient_streamz%2fpo%2fw%2fcsc\",null,[\"cs\",\"rk\"],[[[[null,3],[\"O43z0dpjhgX20SCx4KAo\"]],[1]]],null,[]]]]",null,null,null,null,null,null,0,[null,[],null,"[[],[],[],[]]"],null,null,null,[],1,null,null,null,null,null,[]]],"1696854400955",[]]
mgautierfr commented 11 months ago

Kind of duplicate of https://github.com/openzim/zim-tools/issues/213 (error description) and https://github.com/openzim/zim-tools/issues/318 (proposition to add a option to ignore such file).

Closing the issue (keeping the 2 others open as they address 2 different ways of fixing that)