Closed benoit74 closed 3 months ago
@rgaudin do you have any idea why I had to make this change / how it worked before?
Tested locally and it works way better (inside Docker):
docker run -it --rm -v $(pwd):/data -p 8888:8888 --entrypoint /usr/local/bin/kiwix-serve ghcr.io/rgaudin/kiwix-tools:nightly --port=8888 "/data/output/beer_meta.zim"
@rgaudin do you have any idea why I had to make this change / how it worked before?
I have no idea. All the recipes have their debug logs gone so I started another task to get it.
[ThreadPoolExecutor-0_0::2024-03-27 11:02:32,911] INFO:Extracting 3dprinting.stackexchange.com.7z
[MainThread::2024-03-27 11:02:36,470] INFO:removed badges headers
[MainThread::2024-03-27 11:02:36,543] INFO:sorted Badges by UserId
[MainThread::2024-03-27 11:02:36,605] INFO:removed users headers
[MainThread::2024-03-27 11:02:36,797] INFO:merged both sets
[MainThread::2024-03-27 11:02:36,847] INFO:removed comments headers
[MainThread::2024-03-27 11:02:36,935] INFO:sorted Comments by UserId
[MainThread::2024-03-27 11:02:37,024] INFO:removed posts headers
[MainThread::2024-03-27 11:02:37,267] INFO:merged Posts and Comments
[MainThread::2024-03-27 11:02:37,429] INFO:split Posts-Comments by PostType
[MainThread::2024-03-27 11:02:37,495] INFO:Extracted Post IDs and titles into CSV
[MainThread::2024-03-27 11:02:37,695] INFO:sorted Posts-Comments (questions) by Id
[MainThread::2024-03-27 11:02:37,789] INFO:sorted Posts-Comments (answers) by ParentId
[MainThread::2024-03-27 11:02:37,792] INFO:removed postlinks headers
[MainThread::2024-03-27 11:02:37,805] INFO:sorted PostLinks by PostId
[MainThread::2024-03-27 11:02:37,828] INFO:sorted named post links by RelatedPostId
[MainThread::2024-03-27 11:02:38,024] INFO:Prepared dumps completed.
The line [MainThread::2024-03-27 11:02:37,495] INFO:Extracted Post IDs and titles into CSV
indicates the process did not crash.
When running this, a ton of stuff has already been imported. Is it possible that this module was imported by another one?
Tested locally and it works way better (inside Docker):
Why are you sharing this kiwix-serve command? How is it related?
Why are you sharing this kiwix-serve command? How is it related?
Because, you know, sometimes, copy-paste is not that easy ^^
Proper test command:
sotoki --domain "beer.meta.stackexchange.com" --threads 20 --output /output/ --zim-file beer_meta.zim --mirror "https://org-kiwix-stackexchange.s3.us-west-1.wasabisys.com" --redis-url "unix:///var/run/redis.sock" --debug
Fix #298
Changes:
xml.sax.saxutils
instead ofxml.sax