jojo2357 / kiwix-zim-updater

A script to check `download.kiwix.org` for updates to your local ZIM library.
GNU General Public License v2.0
77 stars 5 forks source link

Write log output to script directory #2

Closed epheterson closed 2 years ago

epheterson commented 2 years ago

It'd be nice if the script wrote a log of its current / last-run progress to the script directory for cases where the script is run via a triggered task and the live output cannot easily be viewed.

For the slow parts it can fill up a progress bar with a visible end, something like this would work by continuously adding characters to the end during a download:

Completed –
[0% - - - - - - - - - - - - - - - - - 100%]
[#########################################]

In Progress –
[0% - - - - - - - - - - - - - - - - - 100%]
[######################
epheterson commented 2 years ago

Actually, found where to see them on Synology, and WOW! A single run resulted in a 360 MB log file that hangs the Synology UI when I try to view it. Any way to slim that down?

epheterson commented 2 years ago

I updated these and ended up with that massive file, and I have gigabit internet so they were downloading relatively fast.

6. Purging Replaced ZIM(s)...
      ✓ Purge: /volume1/docker/kiwix/gutenberg_en_all_2021-12.zim
      ✓ Purge: /volume1/docker/kiwix/ted_en_playlist-the-most-popular-talks-of-2020_2021-01.zim
      ✓ Purge: /volume1/docker/kiwix/ted_en_technology_2021-12.zim
      ✓ Purge: /volume1/docker/kiwix/wikipedia_en_all_maxi_2021-12.zim
      ✓ Purge: /volume1/docker/kiwix/wikivoyage_en_all_maxi_2021-12.zim
      ✓ Purge: /volume1/docker/kiwix/wiktionary_en_all_maxi_2021-10.zim
DocDrydenn commented 2 years ago

I'm now confused... wget shouldn't create any logs when -q is passed. What version of wget are you running? (wget --version)

My script only uses: wget -P $ZIMPath ${CleanDownloadArray[$z]} -q --show-progress

-P to set where to save the file -q for quiet output and to suppress wget-log creation --show-progress for outputting the progress bar onscreen

The only other way to get any type of logging is specifically add the -o flag with a file path & name.

DocDrydenn commented 2 years ago

Also, I've spent the last 2 hours trying to figure out how to create some type of real-time log for wget... it's just not possible without having it vomit all over the normal screen output. (Mainly because wget won't write anything to a log file until the download has completed... I've tried every trick I could find. wget just won't play.)

Do those Synology's have curl? I might be able to do it switching over to curl instead of wget...

DocDrydenn commented 2 years ago

Wiat... is that your Synology creating that monster log?

My script doesn't touch anything (i.e. log files, temp files, etc...) on the system it runs on (except for the download and purge of ZIMs of course). Heck, I even go to the trouble of clearing out my variable arrays when I'm done with them LOL (this really only saves a fraction of the system RAM, but... good housekeeping and such.)

epheterson commented 2 years ago

Hey, yeah it's Synology that saves the script output for scheduled tasks so that you can review the results afterwards. I imagine others who use your script may similarly save the output. Also, yes Synology does have curl and that would work great!

The part taking a ton of space is the progress (I imagine it's the --show-progress arg) which prints out like:

5. Downloading Updates...

      ✓ Download: https://download.kiwix.org/zim/gutenberg/gutenberg_en_all_2022-08.zim

     0K .......... .......... .......... .......... ..........  0%  157K 5d4h
    50K .......... .......... .......... .......... ..........  0%  324K 3d20h
   100K .......... .......... .......... .......... ..........  0%  393K 3d6h
   150K .......... .......... .......... .......... ..........  0%  752K 2d17h
   200K .......... .......... .......... .......... ..........  0%  784K 2d9h
   250K .......... .......... .......... .......... ..........  0% 1.02M 2d2h
   300K .......... .......... .......... .......... ..........  0% 1.19M 45h58m
   350K .......... .......... .......... .......... ..........  0% 1.17M 42h17m
...
70815300K .......... .......... .......... .......... .......... 99% 23.1M 0s
70815350K .......... .......... .......... .......... .......... 99% 21.0M 0s
70815400K .......... .......... .......... .......... .......... 99% 17.8M 0s
70815450K .......... .......... .......... .......... .......... 99% 20.1M 0s
70815500K .......... .......... .......... .......... .......... 99% 22.2M 0s
70815550K .......... .......... .......... .......... .......... 99% 23.1M 0s
70815600K .......... .......... .......... .......... .......... 99% 16.8M 0s
70815650K .......... .......... .......... .......... .......... 99% 20.9M 0s
70815700K .......... .......... .......... .......... .......... 99% 18.2M 0s
70815750K .......... .......... .......... .......... .......... 99% 16.8M 0s
70815800K .......... .......... .......... .......... .......... 99% 7.08M 0s
70815850K .......... .......... .........                       100% 20.8M=77m35s
      ✓ Download: https://download.kiwix.org/zim/ted/ted_en_playlist-the-most-popular-talks-of-2020_2021-12.zim

     0K .......... .......... .......... .......... ..........  0%  168K 1h45m
    50K .......... .......... .......... .......... ..........  0%  375K 76m31s
   100K .......... .......... .......... .......... ..........  0%  505K 62m44s
   150K .......... .......... .......... .......... ..........  0%  825K 52m25s
   200K .......... .......... .......... .......... ..........  0%  949K 45m40s
   250K .......... .......... .......... .......... ..........  0% 1.03M 40m51s
   300K .......... .......... .......... .......... ..........  0% 1.22M 37m2s
...
epheterson commented 2 years ago

Actually just tried with a different script and completed logs are saved on Synology, but they are not visible while the script is in progress. So it'd still be nice if your script offered some way to monitor progress, and it'd also be nice if saving the script output didn't result in hundreds of MB :)

DocDrydenn commented 2 years ago

Interesting... That's an unsuppressed output of wget... this is an interaction from your Synology and wget. I will not have any control over that. That data stream is normally just dumped into the ether... I have no idea why your Synology decides to log it.

I've tested with curl and it does exactly what you're wanting... it will output to the screen and allow that output to be captured into a log file in real-time. A simple tail -f log.file would allow you to see the download status in real-time.

I'll switch over to curl, but I can't make any promises that your Synology won't do the same thing and decide to capture a stream. This is an interaction between your Synolog and wget (possibly with curl too), not the script. It is outside of my and the script's control.

epheterson commented 2 years ago

Alright, did some research and the reason I'm seeing this dot output is because the output isn't being run an interactive terminal and wget falls back to dots when it can't show the live progress bar:

When the output is not a TTY, the progress bar always falls back to “dot”, even if ‘--progress=bar’ was passed to Wget during invocation.

The dot style has a giga option that seems to make my log spew much more managable, e.g.

wget -P /tmp/ https://download.kiwix.org/zim/ted/ted_en_playlist-get-paid-what-you-re-worth_2020-09.zim --progress=dot:giga

curl seems to also be unreasonable and not sure if it supports a smaller output like progress=dot:giga, so I guess for my setup I either need no progress shown, or progress=dot:giga. Thoughts?

DocDrydenn commented 2 years ago

Well... even if I add the progress=dot:giga that won't solve the problem of a real-time log. wget just won't do it (and not mess up the screen output).

I am working on the curl option now (which will give a real-time log output), so let's give curl a try in the morning (It's almost midnight here LOL).

DocDrydenn commented 2 years ago

Okay, so current version (v1.8) has switched over to curl and logging has been added. Also updated README with that logging info.

Please give that a go on your Synology and see if it fixes your monitoring request and your log file issues.

epheterson commented 2 years ago

The output does seem to be quite a bit more concise (only tried on a relatively small file, though). Unfortunately it failed to save using curl and also deleted the original! Filed: https://github.com/DocDrydenn/kiwix-zim/issues/3

epheterson commented 2 years ago

That said, I do see the download.log file, thanks for adding that!

epheterson commented 2 years ago

The text output in the Task Scheduler log is still pretty verbose, but not sure what we can do about it

5. Downloading Updates...

      ✓ Download: https://download.kiwix.org/zim/wikivoyage/wikivoyage_en_all_maxi_2022-08.zim

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   270  100   270    0     0    514      0 --:--:-- --:--:-- --:--:--   515

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  2  682M    2 19.9M    0     0  12.3M      0  0:00:55  0:00:01  0:00:54 20.5M
  9  682M    9 66.8M    0     0  25.5M      0  0:00:26  0:00:02  0:00:24 33.9M
 16  682M   16  113M    0     0  31.3M      0  0:00:21  0:00:03  0:00:18 38.1M
 23  682M   23  159M    0     0  34.6M      0  0:00:19  0:00:04  0:00:15 40.2M
 30  682M   30  207M    0     0  36.8M      0  0:00:18  0:00:05  0:00:13 41.6M
 37  682M   37  254M    0     0  38.4M      0  0:00:17  0:00:06  0:00:11 46.8M
 44  682M   44  300M    0     0  39.4M      0  0:00:17  0:00:07  0:00:10 46.7M
 50  682M   50  347M    0     0  40.3M      0  0:00:16  0:00:08  0:00:08 46.8M
 57  682M   57  394M    0     0  41.0M      0  0:00:16  0:00:09  0:00:07 46.9M
 64  682M   64  441M    0     0  41.5M      0  0:00:16  0:00:10  0:00:06 46.8M
 71  682M   71  488M    0     0  42.0M      0  0:00:16  0:00:11  0:00:05 46.7M
 78  682M   78  535M    0     0  42.4M      0  0:00:16  0:00:12  0:00:04 46.9M
 85  682M   85  582M    0     0  42.7M      0  0:00:15  0:00:13  0:00:02 46.9M
 92  682M   92  629M    0     0  43.0M      0  0:00:15  0:00:14  0:00:01 46.9M
 99  682M   99  676M    0     0  43.3M      0  0:00:15  0:00:15 --:--:-- 47.0M
100  682M  100  682M    0     0  43.3M      0  0:00:15  0:00:15 --:--:-- 46.9M
DocDrydenn commented 2 years ago

v1.9 Replace rev commands. Verification of new ZIM(s) prior to purge of old ZIM(s).

epheterson commented 2 years ago

Hey, I was thinking for this fix it seems to permanently add to the file. Might be better logic to re-create the file each time the script runs so that it doesn't indefinitely grow larger?

DocDrydenn commented 2 years ago

That's not typical practice for log files... Let mull it over for a bit.