Closed Krinkle closed 3 years ago
I've confirmed that the logs are indeed still served from znc.ops.jquery.net
. For example, when navigating to irc.jquery.org / jquery-dev / 2021-04-19, the following entry appears in its access log:
znc:/var/log/nginx# tail irclogs.jquery.com.access.log
... "GET /%23jquery-dev/default_%23jquery-dev_20210419.log.html HTTP/1.0" 200 4185 "https://irc.jquery.org/%23jquery-dev/" ..
Size breakdown:
znc:/var/www# du -sh irclogs.jquery.com/
3.1G irclogs.jquery.com/
znc:/var/www/irclogs.jquery.com# du -sh *
17M #css-chassis
21M #esprima
6.5M #esprima-meeting
14M #globalize
8.1M #grunt-dev
2.4G #jquery
57M #jquery-content
181M #jquery-dev
9.8M #jquery-developer-summit
39M #jquery-infrastructure
126M #jquery-meeting
87M #jquerymobile-dev
146M #jqueryui-dev
17M #pep
znc:/var/www/irclogs.jquery.com# l
Aug 29 15:10 #css-chassis/
Aug 29 15:10 #esprima/
Aug 29 15:10 #esprima-meeting/
Aug 29 15:10 #globalize/
Aug 29 15:10 #grunt-dev/
Aug 29 15:10 #jquery/
Aug 29 15:10 #jquery-content/
Aug 29 15:10 #jquery-dev/
Aug 29 15:10 #jquery-developer-summit/
Aug 29 15:10 #jquery-infrastructure/
Aug 29 15:10 #jquery-meeting/
Aug 29 15:10 #jquerymobile-dev/
Aug 29 15:10 #jqueryui-dev/
Aug 29 15:10 #pep/
May 17 2013 index.html
Each day's logs are stored in two formats:
-rw-r--r-- 1 znc znc 34K Feb 1 2019 default_#jquery_20190131.log
-rw-r--r-- 1 znc znc 81K Feb 1 2019 default_#jquery_20190131.log.html
And the majority of the file's contents, are joins and quits rather than actual messages (IP masking mine, not in actual file):
[23:05:32] *** Quits: braincrash (~braincras@62.*.*.*) (Quit: bye bye)
[23:09:05] *** Joins: braincrash (~braincras@62.*.*.*)
[23:49:53] *** Joins: nikitha1 (~rritec1@117.*.*.*)
I'll do the following:
.log
to .txt
(so that a simple static server will serve it as plain text instead of offer as download).Assuming the result is of reasonable size (e.g. < 100M?), I propose we commit them directly to https://github.com/jquery/irc.jquery.org, and turn that into a simple static site served by GitHub Pages and flip DNS accordingly.
Remove HTML format:
irclogs.jquery.com
$ du -sh .
3.1G
$ find . -name '*.log.html' -delete
..
$du -sh .
883M
Rename for plain text:
$ rename -s '.log' '.txt' \#*/*.log
Strip noise and PII:
irclogs.jquery.com/#jquery$ sed -i '' -E '/^\[..:..:..\] \*\*\* (Joins|Quits|Parts): .*/d' *.txt
irclogs.jquery.com$ find . -name "*.txt" | xargs sed -i '' -E '/^\[..:..:..\] \*\*\* /d'
Delete now-empty files:
$ find . -size 0 -type f -delete
$ find . -name 'irclog.css' -delete
$ du -sh .
319M .
Re-create index files:
#pep$ ls | sed 's/#/%23/' | sed 's/^\(.*\)\([0-9][0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9]\).txt$/<li><a href="\1\2\3\4.txt">\2-\3-\4<\/a><\/li>/'
<li><a href="default_%23pep_20150113.txt">2015-01-13</a></li>
<li><a href="default_%23pep_20150114.txt">2015-01-14</a></li>
...
#pep$ ls | sed 's/#/%23/' | sed 's/^\(.*\)\([0-9][0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9]\).txt$/<li><a href="\1\2\3\4.txt">\2-\3-\4<\/a><\/li>/' | pbcopy
#pep$ (paste into index.html)
Okay, a proof of concept is up at https://github.com/jquery/irc.jquery.org/tree/irclogs-static.
Preview at https://jquery.github.io/irc.jquery.org/.
I disabled the WordPress builder webhooks for the jquery/irc.jquery.org
repo on GitHub, to avoid future commits being deployed there so that whatever is there today will keep working until we're ready to switch the DNS.
@Krinkle the preview (https://irc.jquery.org/index.html) has no content. Is that intended?
@Krinkle the preview (
https://irc.jquery.org/index.html
) has no content. Is that intended?
That's the live link, not the preview link. There was a bad redirect between the two cached by GitHub earlier today, which has been fixed.
I just has a closer look. This is awesome, @Krinkle! 👏🏻 I was also thinking about cutting noise & removing files with just status updates, it's great to see it.
Just one suggestion: please put newest files at the top, not at the bottom, similarly to how it's done now at https://irc.jquery.org/
This looks really great. Anything needed from me here?
@brianwarner Yep, for irc.jquery.org
DNS to point to GitHub Pages (e.g. as for qunitjs.com)
@mgol Sounds good to me, I'll submit an improvement to that end (can imho wait until after DNS switch, though).
Sure, that improvement shouldn’t block the rollout.
Good deal, you should be in good shape once it replicates.
@brianwarner LGTM. Thanks.
The
znc.ops.jquery.net
host mapping for Puppet provisioning was removed in 2016 with commit 74ea83d1499. However, the manifests files remain and there also appear to be some references to the ZNC server and it appears to possibly still be running and hosting the (only?) copy of the logs of previous IRC meeting discussions.