Closed danielboven closed 4 years ago
It doesn't necessarily be the lag, it might be downloading the dependencies and the content. Try running with verbose flag, you can get some extra insights on what's going on.
@LakmaNeha I think you're right, it seems to be doing something else in the background. I haven't started it with verbose (yet), but it has finished successfully now. Thanks for your explanation ;)
I'm running mwoffliner for around 5 days now in order to scrape the English Wiktionary. I noticed that probably since a day or so the output has decreased massively. The process starts scraping something for a minute or so (and it outputs to the terminal), but then stops and is stuck for a around 15 minutes. After the message
Heatbeat - OK
it continues again (so it stops being stuck, starts producing output). The resource manager also confirms this. During normal scraping the peak is at 30% CPU usage, but when the output stops the process goes down to 0% CPU usage.This is an example of the log I'm getting:
To start I used the command
mwoffliner --mwUrl=https://en.wiktionary.org/ --adminEmail=x@outlook.com --outputDirectory=/mnt/sda1/dump/zim/02-03-2020/en.wiktionary.org
on a Ubuntu Server, 18.04 LTS machine. I have 16GB of memory btw.This is probably the third time that this happens to me when trying to scrape Wiktionary, so restarting doesn't help. What could be the cause of this?