Open acka47 opened 9 months ago
To Dos:
The monthly access logs (eg. access_log-20230901
) were only partially (01/Aug/2023 — 18/Aug/2023) included in the generated reports due to our preprocessing.
Example: for some reason grep ' www.lobid.org ' /tmp/access_log-20230901
didn't parsed the whole log file and output "grep: /tmp/access_log-20230901: Übereinstimmungen in Binärdatei". Solution: use grep --text
or fgrep --text
as we are searching for a fixed string.
The binary data in the logs results from the server crash. --text
is good!
We could do bzfgrep --text
on the compressed logs directly. It takes 7m3,410s
compared to 2m51,289s
for access_log-20230901(.bz2)
. What do you think?
bzfgrep --text
:+1:
I'm stuck at filtering out static files like *.png
or *.css
, eg.
grep --text -E -v "(robots.txt|.ico|.woff2|.ttf|.webp|.gif|.svg|.jpg|.png|.js|.css)" access_log_lobid-blog-20230901
works fine, but these files are still listed in the report generated by goaccess, see screenshot:
and i don't know why. @dr0i Can you help?
I couldn't find a flaw in your code . Checking http://gaia.hbz-nrw.de/stats/lobid/access_log_lobid-blog-2023-09-01.html I cannot see e.g. any png
. Did you check the proper output?
Due to some obscure replacement of quotes in bash grep commands like the one above don't work as expected. As a workaround i'm calling grep directly in bash (not via variable nor array). Thx, @dr0i .
All monthly and yearly reports for 2023 are beeing generated anew. It will take approx. 10 hours.
Log files from 20230101
should be excluded from the yearly overview for 2023
. Include files from 20240101
instead as they contain entries from december 2023.
Yearly overviews are beeing generated anew.
@acka47 Please review again.
This looks good to me now. Thanks! Can we make the stats available openly on the web so that NWBib editors can view it? I guess, there shouldn't be any problems re. privacy.
Furthermore, we will also need this for RPB (https://rpb.lobid.org/), RPPD (https://rppd.lobid.org/) and BiblioVino (https://wein.lobid.org/). LBZ partners just asked, see RPB-42. Should I open a separate issue for this or do we add this in the context of this issue?
I just talked about this issue with @Phu2 . And here are the next steps:
Things to consider:
Current status is at http://gaia.hbz-nrw.de/stats/