jquery / infrastructure-puppet

Puppet configuration for jQuery Infrastructure servers.
MIT License
7 stars 9 forks source link

Decom znc.ops.jquery.net (irc.jquery.org and jq05) #48

Closed Krinkle closed 3 years ago

Krinkle commented 3 years ago

The znc.ops.jquery.net host mapping for Puppet provisioning was removed in 2016 with commit 74ea83d1499. However, the manifests files remain and there also appear to be some references to the ZNC server and it appears to possibly still be running and hosting the (only?) copy of the logs of previous IRC meeting discussions.

Krinkle commented 3 years ago

I've confirmed that the logs are indeed still served from znc.ops.jquery.net. For example, when navigating to irc.jquery.org / jquery-dev / 2021-04-19, the following entry appears in its access log:

znc:/var/log/nginx# tail irclogs.jquery.com.access.log ... "GET /%23jquery-dev/default_%23jquery-dev_20210419.log.html HTTP/1.0" 200 4185 "https://irc.jquery.org/%23jquery-dev/" ..

Krinkle commented 3 years ago

Size breakdown:

znc:/var/www# du -sh irclogs.jquery.com/
3.1G    irclogs.jquery.com/

znc:/var/www/irclogs.jquery.com# du -sh *
17M #css-chassis
21M #esprima
6.5M    #esprima-meeting
14M #globalize
8.1M    #grunt-dev
2.4G    #jquery
57M #jquery-content
181M    #jquery-dev
9.8M    #jquery-developer-summit
39M #jquery-infrastructure
126M    #jquery-meeting
87M #jquerymobile-dev
146M    #jqueryui-dev
17M #pep

znc:/var/www/irclogs.jquery.com# l
Aug 29 15:10 #css-chassis/
Aug 29 15:10 #esprima/
Aug 29 15:10 #esprima-meeting/
Aug 29 15:10 #globalize/
Aug 29 15:10 #grunt-dev/
Aug 29 15:10 #jquery/
Aug 29 15:10 #jquery-content/
Aug 29 15:10 #jquery-dev/
Aug 29 15:10 #jquery-developer-summit/
Aug 29 15:10 #jquery-infrastructure/
Aug 29 15:10 #jquery-meeting/
Aug 29 15:10 #jquerymobile-dev/
Aug 29 15:10 #jqueryui-dev/
Aug 29 15:10 #pep/
May 17  2013 index.html

Each day's logs are stored in two formats:

-rw-r--r--  1 znc znc    34K Feb 1  2019 default_#jquery_20190131.log
-rw-r--r--  1 znc znc    81K Feb  1  2019 default_#jquery_20190131.log.html

And the majority of the file's contents, are joins and quits rather than actual messages (IP masking mine, not in actual file):

[23:05:32] *** Quits: braincrash (~braincras@62.*.*.*) (Quit: bye bye)
[23:09:05] *** Joins: braincrash (~braincras@62.*.*.*)
[23:49:53] *** Joins: nikitha1 (~rritec1@117.*.*.*)

I'll do the following:

Assuming the result is of reasonable size (e.g. < 100M?), I propose we commit them directly to https://github.com/jquery/irc.jquery.org, and turn that into a simple static site served by GitHub Pages and flip DNS accordingly.

Krinkle commented 3 years ago

Remove HTML format:

irclogs.jquery.com
$ du -sh .
3.1G    
$ find . -name '*.log.html' -delete
..
$du -sh .
883M

Rename for plain text:

$ rename -s '.log' '.txt' \#*/*.log

Strip noise and PII:

irclogs.jquery.com/#jquery$ sed -i '' -E '/^\[..:..:..\] \*\*\* (Joins|Quits|Parts): .*/d' *.txt

irclogs.jquery.com$ find . -name "*.txt" | xargs sed -i '' -E '/^\[..:..:..\] \*\*\* /d'

Delete now-empty files:

$ find . -size 0 -type f -delete 
$ find . -name 'irclog.css' -delete
$ du -sh .
319M    .

Re-create index files:

#pep$ ls | sed 's/#/%23/' | sed 's/^\(.*\)\([0-9][0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9]\).txt$/<li><a href="\1\2\3\4.txt">\2-\3-\4<\/a><\/li>/'
<li><a href="default_%23pep_20150113.txt">2015-01-13</a></li>
<li><a href="default_%23pep_20150114.txt">2015-01-14</a></li>
...

#pep$ ls | sed 's/#/%23/' | sed 's/^\(.*\)\([0-9][0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9]\).txt$/<li><a href="\1\2\3\4.txt">\2-\3-\4<\/a><\/li>/' | pbcopy
#pep$ (paste into index.html)
Krinkle commented 3 years ago

Okay, a proof of concept is up at https://github.com/jquery/irc.jquery.org/tree/irclogs-static.

Preview at https://jquery.github.io/irc.jquery.org/.

I disabled the WordPress builder webhooks for the jquery/irc.jquery.org repo on GitHub, to avoid future commits being deployed there so that whatever is there today will keep working until we're ready to switch the DNS.

mgol commented 3 years ago

@Krinkle the preview (https://irc.jquery.org/index.html) has no content. Is that intended?

Krinkle commented 3 years ago

@Krinkle the preview (https://irc.jquery.org/index.html) has no content. Is that intended?

That's the live link, not the preview link. There was a bad redirect between the two cached by GitHub earlier today, which has been fixed.

mgol commented 3 years ago

I just has a closer look. This is awesome, @Krinkle! 👏🏻 I was also thinking about cutting noise & removing files with just status updates, it's great to see it.

Just one suggestion: please put newest files at the top, not at the bottom, similarly to how it's done now at https://irc.jquery.org/

brianwarner commented 3 years ago

This looks really great. Anything needed from me here?

Krinkle commented 3 years ago

@brianwarner Yep, for irc.jquery.org DNS to point to GitHub Pages (e.g. as for qunitjs.com)

@mgol Sounds good to me, I'll submit an improvement to that end (can imho wait until after DNS switch, though).

mgol commented 3 years ago

Sure, that improvement shouldn’t block the rollout.

brianwarner commented 3 years ago

Good deal, you should be in good shape once it replicates.

image

Krinkle commented 3 years ago

@brianwarner LGTM. Thanks.