Open kelson42 opened 5 years ago
I strongly support implementing this. It is gonna help zimfarm a lot of we can get the current progress. Do we have any idea on how this is going to be implemented?
Probably using something like this... https://github.com/vadimdemedes/ink
Can this info be retrieved programmatically in a container environment?
Not using ink
specifically, but we could implement some kind of api
@automactic Is the existing percentage log good enough for this?
No I don't think so. I would prefer a way to retrieve progress proactively, rather than passively wait for a progress message to show up.
@automactic Ok, that sounds a bit more complicated that what I thought. What kind of technology you have in mind to achieve to do so?
@automactic @ISNIT0 What do you think about using something like https://www.zerorpc.io/ over a socket in /var/run/?
@automactic I think a progress API is a bit out of scope of MWOffliner. Would it be possible to grep/match logs from MWO? I'm happy to re-format logs to be more machine processable
@rgaudin Have you any past experience with the solution proposed by @ISNIT0 ?
I think parse logs is not the best solution and could tend to be flaky. For example, the container might generate a lot of logs so there happens to be no progress info in the batch of logs being fetched.
How about set the progress in Redis? zimfarm worker will periodically GET
the key of progress stored in Redis. If make sense, we could provide more detailed stage based progress, etc.
zimfarm worker could also listen to some key changes in Redis, so user could be notified of events in mwoffliner.
I think there are two different things to consider:
Both should be somewhat independant and tackled separately.
The first one is a mechanism to calculate the effort and report on progress towards this effort. This is internal to mwoffliner and should be available in the logs somehow (could be a periodic print on the log).
The second one, which depends on the first one of course (we need the calculation) should be implemented in a way that can be duplicated on other scrapers. That excludes redis. I think we could introduce a super simple API to which scrapers would report progress to.
That interface/API could be HTTP or socket-based (zerorpc?). What matters most here is the simplicity to implement calls to that API in all of the scrapers (ie. using their various technologies).
--report-progress-to /var/run/zimfarm_mwoffliner_xxx.sock
That excludes redis
that is a good point, forget what I said
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
We have an agreement that mwoffliner should be given a json file path to update with progresses. This file been then read by the Zimfarm worker (or any other process) to then report.
See https://github.com/openzim/zimfarm/issues/331 to be implemented in the next days.
The expected format of this JSON file is:
{"done": 1, "total": 32}
Its name should be passed to an option enabling that feature and, for the zimfarm, we'll place it in the output directory (what we mount as a volume). in zimit, we allow passing either an absolute path or a relative one; in which case we create it in output dir.
@rgaudin Concretly, what would you propose in term of command line option?
In zimit, we have --statsFilename
but it doesn't matter much as long as the other mentioned requirements are met.
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
This feature would be a true benefit, especially considering there is no resume functionality!
We need a way to know roughly how many percents of the whole scraping has already been done.