hbz / lobid

Linking Open Bibliographic Data
https://lobid.org/
Eclipse Public License 2.0
15 stars 4 forks source link

Set up web analytics with GoAccess #512

Open acka47 opened 9 months ago

acka47 commented 9 months ago

Current status is at http://gaia.hbz-nrw.de/stats/

acka47 commented 9 months ago

To Dos:

Phu2 commented 9 months ago

The monthly access logs (eg. access_log-20230901) were only partially (01/Aug/2023 — 18/Aug/2023) included in the generated reports due to our preprocessing. Example: for some reason grep ' www.lobid.org ' /tmp/access_log-20230901 didn't parsed the whole log file and output "grep: /tmp/access_log-20230901: Übereinstimmungen in Binärdatei". Solution: use grep --text or fgrep --text as we are searching for a fixed string.

dr0i commented 9 months ago

The binary data in the logs results from the server crash. --text is good!

Phu2 commented 9 months ago

We could do bzfgrep --text on the compressed logs directly. It takes 7m3,410s compared to 2m51,289s for access_log-20230901(.bz2). What do you think?

dr0i commented 9 months ago

bzfgrep --text :+1:

Phu2 commented 8 months ago

I'm stuck at filtering out static files like *.png or *.css, eg.

grep --text -E -v "(robots.txt|.ico|.woff2|.ttf|.webp|.gif|.svg|.jpg|.png|.js|.css)" access_log_lobid-blog-20230901

works fine, but these files are still listed in the report generated by goaccess, see screenshot:

grafik

and i don't know why. @dr0i Can you help?

dr0i commented 8 months ago

I couldn't find a flaw in your code . Checking http://gaia.hbz-nrw.de/stats/lobid/access_log_lobid-blog-2023-09-01.html I cannot see e.g. any png. Did you check the proper output?

Phu2 commented 8 months ago

Due to some obscure replacement of quotes in bash grep commands like the one above don't work as expected. As a workaround i'm calling grep directly in bash (not via variable nor array). Thx, @dr0i .

Phu2 commented 8 months ago

All monthly and yearly reports for 2023 are beeing generated anew. It will take approx. 10 hours.

Phu2 commented 8 months ago

Log files from 20230101 should be excluded from the yearly overview for 2023. Include files from 20240101 instead as they contain entries from december 2023.

Phu2 commented 8 months ago

Yearly overviews are beeing generated anew.

Phu2 commented 8 months ago

@acka47 Please review again.

acka47 commented 6 months ago

This looks good to me now. Thanks! Can we make the stats available openly on the web so that NWBib editors can view it? I guess, there shouldn't be any problems re. privacy.

Furthermore, we will also need this for RPB (https://rpb.lobid.org/), RPPD (https://rppd.lobid.org/) and BiblioVino (https://wein.lobid.org/). LBZ partners just asked, see RPB-42. Should I open a separate issue for this or do we add this in the context of this issue?

acka47 commented 5 months ago

I just talked about this issue with @Phu2 . And here are the next steps:

Phu2 commented 5 months ago

Things to consider: