Closed IMayBeABitShy closed 6 months ago
Thanks a lot for everything: many attention to details, very precise bug report, and nice suggestions.
I will have a look into it tomorrow, but I will probably simply disable all sotoki recipes for now, no need to waste resources, your bug report and investigation seems quite clear for me, I unfortunately do not expect that some stackoverflow domains might still work.
I've started two small domains as well: sustainability.stackexchange.com (your suggestion) and tezos.stackexchange.com (my pick, even smaller).
I ran them with the docker image we use in production:
docker run -v $(pwd):/output --name sotoki_sustainability.stackexchange.com_en --detach --rm ghcr.io/openzim/sotoki:2.0.2 sotoki --debug --domain="sustainability.stackexchange.com" --mirror="https://org-kiwix-stackexchange.s3.us-west-1.wasabisys.com" --output="/output" --threads="8" --redis-url="unix:///var/run/redis.sock" --stats-filename="/output/task_progress.json" --keep-redis --publisher="openZIM"
docker run -v $(pwd):/output --name sotoki_tezos.stackexchange.com_en --detach --rm ghcr.io/openzim/sotoki:2.0.2 sotoki --debug --domain="tezos.stackexchange.com" --mirror="https://org-kiwix-stackexchange.s3.us-west-1.wasabisys.com" --output="/output" --threads="8" --redis-url="unix:///var/run/redis.sock" --stats-filename="/output/task_progress.json" --keep-redis --publisher="openZIM"
Both succeeded and produced a working ZIM (I didn't made maybe tests but at least they open and you can browse), so Zimfarm seems to be safe to continue, no hurry, that's a good news. And I now doubt something changed in stackexchange dumps since tests worked.
I wanted to run tezos.stackexchange.com with dev
docker image to confirm if the issue is linked to recent changes on main
branch but the image fails to start (see https://github.com/openzim/sotoki/issues/294 if needed).
Could you please try to use the code from the 2.0.2
tag to confirm what is happening?
I suspect two possibilities:
main
branchOk, I found the problem and it was a stupid mistake on my side caused by my inexperience when working with redis. To put it simply, I've forgotten to empty the redis server between runs. After manually issuing a flushall
the problem no longer occurs.
I apologize for wasting your time.
No worries, it is not like you did not investigated at all before raising this issue and it was quite easy (and useful) to confirm everything is fine.
Since roughly two weeks ago I've been getting a
KeyError
related to tag ids when trying to build ZIM files. I've been waiting to see if the build also fails on the zimfarm, but so far no sheduled sotoki run occured since then.Traceback:
The error occurs relatively late during the build:
Command used:
During debugging, I've:
tags_ids.json
file in the database directoryThe ZIM creation worked properly until quite recently, I believe that this bug may be caused by a change in the stack exchange dumps, perhaps a deleted but still referenced tag?
We may want to confirm and fix this before the next wave of sotoki build starts, or there may be a lot of wasted resources for nothing.