varnishcache / pkg-varnish-cache

Package building scripts for official Debian and Redhat packages of Varnish Cache.
22 stars 32 forks source link

Fix broken pipe in case VCL is larger than pipe buffer #134

Closed ooprala closed 4 years ago

ooprala commented 4 years ago

The issue

In case the output of vcl.show -w is larger than the pipe buffer (see Pipe Capacity), the following chain of commands in find_vcl_file() can cause a broken pipe error, resulting in the write side receiving a SIGPIPE: https://github.com/varnishcache/pkg-varnish-cache/blob/ad4ebad81381cb75d99de2b05105b232b6efa14d/systemd/varnishreload#L116-L123

Alternatively, in case SIGPIPE is masked (which is the default in systemd.exec), the write() syscall simply fails (see EPIPE in man 2 write).

Impact

There seems to be no impact on the script's functionality, as awk only exits once a satisfactory input is found. However, that is definitely not clear from the error that can be seen in systemd logs:

Nov 01 00:00:00 hostname systemd[1]: Reloading Varnish Cache, a high-performance HTTP accelerator.
Nov 01 00:00:00 hostname varnishreload[14456]: sh: echo: I/O error
Nov 01 00:00:00 hostname varnishreload[14456]: VCL 'reload_20191101_000000_14456' compiled
Nov 01 00:00:00 hostname varnishreload[14456]: VCL 'reload_20191101_000000_14456' now active
Nov 01 00:00:00 hostname systemd[1]: Reloaded Varnish Cache, a high-performance HTTP accelerator.

Due to the nature of shells' setup on Debian, varnishreload would be executed under /bin/dash as that's what /bin/sh points to. When the script is modified to be run by /bin/bash, we can see the error more clearly:

Nov 01 00:00:00 hostname systemd[1]: Reloading Varnish Cache, a high-performance HTTP accelerator.
Nov 01 00:00:00 hostname varnishreload[14456]: line 124: echo: write error: Broken pipe
Nov 01 00:00:00 hostname varnishreload[14456]: VCL 'reload_20191101_000000_14456' compiled

The gory details

The reason this error happens is that once awk encounters the desired line, it is instructed to exit, effectively closing the read side of the pipe between awk and the writing echo. In case the contents of ${VCL_SHOW} are large enough ( more on that below ), echo might not have finished writing the entirety of VCL_SHOW yet, resulting in its write() syscall failing and kernel sending it a SIGPIPE (unless it is masked).

The below C code (using GNU extensions) demonstrates this issue:

#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

// formatted with clang-format

void handle_sigpipe(int sig) {
  fprintf(stderr, "Caught a SIGPIPE - exiting\n");
  signal(SIGPIPE, SIG_DFL);
  kill(getpid(), sig);
}

void free_buf(int ret, void *buf) { free(buf); }

int main(int argc, char *argv[]) {
  // man 3 program_invocation_short_name
  const char *appname = program_invocation_short_name;

  if (!strcmp(appname, "out_masked")) {
    signal(SIGPIPE, SIG_IGN);
  } else {
    signal(SIGPIPE, handle_sigpipe);
  }

  // F_GETPIPE_SZ is a linux-specific flag available since kernel 2.6.35
  // See `man 7 pipe' for more
  int pipe_sz = fcntl(STDOUT_FILENO, F_GETPIPE_SZ);
  if (pipe_sz == -1) {
    fprintf(stderr,
            "You need to pipe the STDOUT of this command"
            " for the example to work: %s\n",
            strerror(errno));
    return 1;
  }

  /*
   * Read twice the Pipe Capacity number of bytes from /dev/urandom and
   * write it to STDOUT.
   */
  size_t buf_size = 2 * pipe_sz;
  char *buf = NULL;
  on_exit(free_buf, buf); // Won't work for the SIGPIPE case
  buf = (char *)malloc(buf_size);
  if (buf == NULL) {
    fprintf(stderr, "Failed to allocate %z bytes of data: %s", buf_size,
            strerror(errno));
    return 1;
  }
  int urandom_fd = open("/dev/urandom", 0 /*flags*/, S_IRUSR);

  ssize_t rstatus = read(urandom_fd, buf, buf_size);
  if (rstatus == -1) {
    fprintf(stderr, "read() failed: %s\n", strerror(errno));
    return 1;
  }

  ssize_t wstatus = write(STDOUT_FILENO, buf, buf_size);
  if (wstatus == -1) {
    fprintf(stderr, "write() failed: %s\n", strerror(errno));
    return 1;
  }
  return 0;
}

If you save the code to out.c, the below script will help you compile it and showcase its behaviour when SIGPIPE is/isn't masked:

#!/bin/bash --posix
​
set -e
set -u
​
BOLD='\e[1m'
WARN='\e[33m'
RESET='\e[0m'
​
PIPE_MAX_SIZE=$(</proc/sys/fs/pipe-max-size)
PAGESIZE=$(getconf PAGESIZE)
PIPE_DEFAULT_SIZE=$((16 * PAGESIZE))
​
# REQUIREMENTS:
#  At least Linux 2.6.35 expected
#  _GNU_SOURCE available
#  GCC
​
SIGPIPE=$(/bin/kill -l PIPE)
​
if (( 2 * PAGESIZE > PIPE_MAX_SIZE)); then
    echo -e "${WARN}Your PIPE_MAX_SIZE is less than twice the default pipe capacity"
    echo -e "This example might not work for you as intended${RESET}"
fi
​
WORKDIR=$(mktemp -d -p /tmp)
pushd "${PWD}" &>/dev/null
trap 'popd; rm -rf "${WORKDIR}"' EXIT
​
# See the section "Pipe capacity" in "man 7 pipe"
​
​
CODE="${PWD}/out.c"
​
cd "${WORKDIR}"
gcc "${CODE}" -o out_default
ln -s out_default out_masked
​
echo -e "${BOLD}Broken pipe without SIGPIPE masked${RESET}:"
​
# From the bash manpage:
# : [arguments]
#      No  effect; the command does nothing beyond expanding arguments and performing any specified redirections.
#      The return status is zero.
./out_default | :
echo "${PIPESTATUS[*]}"
​
​
echo -e "${BOLD}Broken pipe with SIGPIPE masked${RESET}:"
./out_masked | :
echo "${PIPESTATUS[*]}"

The fix

Please see the attached patch. Since the new code always reads in the entire input - even if it will no longer result in any additional output - the broken pipe error effectively cannot happen. At least not in the case described above, that is. Echo will always have all of its output properly consumed.

There are of course other solutions to this issue, for example the following would probably work equally as well:

VCL_FILE=$(echo “$VCL_SHOW” |  grep -E '^// VCL.SHOW [0-9]+ [0-9]+' | cut -d' ' -f5- | head -n1)
dridi commented 4 years ago

Thank you for bringing this to our attention. I have reserves regarding your patch and will need to study it closely to ensure it works across the platforms we support. I would also prefer to keep the script simple and the proposed fix is a bit complicated.

Let's start with a reproducer: can you please produce a dummy VCL large enough to trigger the SIGPIPE on your system?

dridi commented 4 years ago

I failed to reproduce this with a 1.1GB VCL and the maximum possible cli_limit on my machine.

ooprala commented 4 years ago

What exactly do you mean by "failed to reproduce"? Did echo return 0 in your case?

Let me try with a shorter, more direct example:

#!/bin/bash
set -e
set -u

PAGESIZE=$(getconf PAGESIZE)
# At least Linux-2.6.11 assumed
PIPE_DEFAULT_SIZE=$((16 * PAGESIZE))

TMPDIR="$(mktemp -d -p /tmp)"
trap 'rm -rf "${TMPDIR}"' EXIT

VCL="${TMPDIR}/generated.vcl"
cat > "${VCL}" <<EOF
// VCL.SHOW 0 0 /tmp/file with 3 spaces.vcl
EOF

dd if=/dev/urandom count=1 bs=$(( 2 * PIPE_DEFAULT_SIZE )) 2>/dev/null | base64 >> "${VCL}"
VCL_CONTENTS="$(cat ${VCL})"
echo "${VCL_CONTENTS}" | awk '$1 == "//" && $2 == "VCL.SHOW" {print; exit}' > /dev/null
echo "echo's return status: ${PIPESTATUS[0]}"

The above example creates a VCL that starts with: // VCL.SHOW 0 0 /tmp/file with 3 spaces.vcl ...and then appends a bit more than twice of what should be the default pipe buffer in random base64 data. Finally it pipes the contents of that VCL to the exact same awk command used in varnishreload and outputs echo's return status.

Can you please run the example and tell me what it outputs for you? Thank you!

dridi commented 4 years ago

Can you please give me steps to reproduce this with the actual varnishreload script?

I looked at the blanks question, and found an actual bug unrelated to SIGPIPE. Can you give a try to this fix? https://github.com/varnishcache/pkg-varnish-cache/commit/1f85ec0f07129e587e6af654bd9f68762ec3d117

ooprala commented 4 years ago

@Dridi can you please tell me what my script in https://github.com/varnishcache/pkg-varnish-cache/pull/134#issuecomment-555879785 output for you?

There's no need to run it in systemd. Just run it in shell, I probably should have mentioned that.

dridi commented 4 years ago

I'm not interested in trying your script, my priority is to fix a bug if there is one. I was unable to reproduce the SIGPIPE problem so until I have a reproducer I'm not going to chase a theoretical chimera.

At some later point you were still going on about SIGPIPE but used the following as an example:

// VCL.SHOW 0 0 /tmp/file with 3 spaces.vcl

So I tried this example and fair enough, it did not work. However, it was not failing because of a SIGPIPE. If you write any shell script with a pipeline you might run into a broken pipe, this is not a problem. Otherwise we should ban the head command from any script ever.

And so I fixed in https://github.com/varnishcache/pkg-varnish-cache/commit/1f85ec0f07129e587e6af654bd9f68762ec3d117 the bug I found, but until I have steps to reproduce, I'm not going after a complicated patch solving a non-issue.

gquintard commented 4 years ago

Hi @ooprala,

Thank you for your report. Can I ask you to explain what the issue is on your side?

I do understand that as soon as we have a awk match, we are going to break the pipeline (triggering a SIGPIPE). But what's the negative impact of this? In shell there's little consequence for doing so, and in this particular case, to my knowledge, there's no failure, the script runs as expected and there's no extra output to complain. So my uneducated conclusion would be "no harm no foul".

The way I understand it, you can only know about the SIGPIPE if you actively look for it.

Please help me understand how this is impacting you, notably how this causes the script to not work as intended.

ghostinushanka commented 4 years ago

Hello @gquintard, you are correct that the Nov 01 00:00:00 hostname varnishreload[14456]: sh: echo: I/O error displayed in the systemd output upon varnish reload does not seem (we were unable to detect any direct impact) to have an effect on the reload procedure other than the message itself.

Seeing it in the logs however raises questions of what might be actually wrong (even if it is not in the end), so we've investigated and understood the underlying logic fully and devised a patch which will make sure that the error is not displayed anymore so no one else (be it human or automated monitoring system) is ever confused by the message.

dridi commented 4 years ago

@ghostinushanka I completely agree with you that a spurious error message can be harmful, but in my very first comment I asked for a reproducer and explained that I would need some time to study the proposed fix to ensure it works on all the platforms we support. For example I need make sure it doesn't rely on a GNU sed-specific extensions.

I spent a lot of time trying to reproduce the error on my system to no avail: the error message wouldn't show up in neither the systemd journal or the syslog. So until I have steps to reproduce the error, and can actually try the fix and confirm that it makes the error go away, I can't merge this pull request.

dridi commented 4 years ago

I can't test whether this patch solves your problem but I had something like this in mind to avoid the echo shell builtin:

diff --git i/systemd/varnishreload w/systemd/varnishreload
index 0ac4062..2064895 100755
--- i/systemd/varnishreload
+++ w/systemd/varnishreload
@@ -114,7 +114,9 @@ find_label_ref() {
 find_vcl_file() {
        VCL_SHOW=$(varnishadm vcl.show -v "$VCL_NAME" 2>&1) || :
        VCL_FILE=$(
-               echo "$VCL_SHOW" |
+               cat <<-EOF |
+               $VCL_SHOW
+               EOF
                awk '$1 == "//" && $2 == "VCL.SHOW" {print; exit}' | {
                        # all this ceremony to handle blanks in FILE
                        read -r DELIM VCL_SHOW INDEX SIZE FILE

It works on my machine, and the overhead of an extra cat process in this context is probably fine (the shell would probably have to fork itself regardless since this is a pipeline). I'm not worried about portability, but I don't know whether this fixes the problem at hand.

ooprala commented 4 years ago

@Dridi, me and @ghostinushanka have prepared a Vagrant-based reproducer, which you can find here https://github.com/Showmax/varnishreload_reproducer. The manual steps needed to reproduce the issue are outlined in README.md.

You can also see the above repo being used to reproduce the issue in this short screencast: https://drive.google.com/file/d/1Giw5gu3G9jLKGpcEhVePgG1tlCtmsdmN/view

dridi commented 4 years ago

@ooprala, thanks, I will give it a try.

ghostinushanka commented 4 years ago

Hello @Dridi, we've evaluated multiple possible solutions - this is a shell script after all. Including the possible grep/cut/head @ooprala mentioned in the very first message. Sed solution was devised for one simple reason - sed processes streams, which is exactly what is happening here, so instead of using multiple pipes and throwing the VCL contents left and right - it is processed once and for good and very fast regardless of the filesize. Sed in reality is a oneliner, we've simply decided to thoroughly comment it for the future readers.

dridi commented 4 years ago

Thank you for the reproducer, I was finally able to see it in action and I have the following patch:

diff --git i/systemd/varnishreload w/systemd/varnishreload
index 0ac4062..9682621 100755
--- i/systemd/varnishreload
+++ w/systemd/varnishreload
@@ -85,7 +85,8 @@ varnishadm() {
                echo
                return 1
        fi >&2
-       echo "$OUTPUT"
+       echo "$OUTPUT" |
+       tee /dev/null # drain the pipe to avoid spurious error messages
 }

 fail() {
@@ -112,23 +113,13 @@ find_label_ref() {
 }

 find_vcl_file() {
-       VCL_SHOW=$(varnishadm vcl.show -v "$VCL_NAME" 2>&1) || :
-       VCL_FILE=$(
-               echo "$VCL_SHOW" |
-               awk '$1 == "//" && $2 == "VCL.SHOW" {print; exit}' | {
-                       # all this ceremony to handle blanks in FILE
-                       read -r DELIM VCL_SHOW INDEX SIZE FILE
-                       echo "$FILE"
-               }
-       ) || :
-
-       if [ -z "$VCL_FILE" ]
-       then
-               echo "$VCL_SHOW" >&2
-               fail "failed to get the VCL file name"
-       fi
-
-       echo "$VCL_FILE"
+       varnishadm vcl.show -v "$VCL_NAME" |
+       awk '$1 == "//" && $2 == "VCL.SHOW" {print; exit}' | {
+               # all this ceremony to handle blanks in FILE
+               read -r DELIM VCL_SHOW INDEX SIZE FILE &&
+               echo "$FILE"
+       } ||
+       fail "failed to get the VCL file name"
 }

 find_vcl_reloads() {

Playing further with large varnishadm output I noticed that I could run into the same problem in the varnishadm() function so I opted for a different means to drain the pipe.

I also stuck to awk because the varnishadm output is columnar, which makes awk a better fit than sed in my opinion, and awk is already used in other parts of the script while sed isn't. This allowed me to simplify the find_vcl_file() function to something even simpler than before my fix for #104. And it maintains the overall error message when things fail.

Can you please give it a try?

ooprala commented 4 years ago

Your patch doesn't really fix the issue though, it only conceals it, since tee doesn't complain about pipe-related write errors by default (see tee's man page).

If you change tee to tee --output-error=exit you will see something along these lines in journalctl -ru varnish (in the vagrant box setup we've provided):

Nov 21 14:52:45 debian9.localdomain varnishreload[3411]: sh: echo: I/O error
Nov 21 14:52:45 debian9.localdomain varnishreload[3411]: tee: 'standard output': Broken pipe

Which is basically the exact same error.

dridi commented 4 years ago

Ok, let's step a step back.

Your patch doesn't really fix the issue though, it only conceals it, since tee doesn't complain about pipe-related write errors by default (see tee's man page).

If the problem is that you want to get rid of "sh: echo: I/O error" occurrences in your journal then yes, my patch certainly fixes that.

If you change tee to tee --output-error=exit you will see something along these lines in journalctl -ru varnish (in the vagrant box setup we've provided):

The --output-error option is a GNU extension, exactly the kind of feature I can't use for portability reasons. Also, if the default behavior does what we want, why bother worrying about something that would have happened only if we had used a non-standard option that we can't use anyway?

I actually spent some time reading what the POSIX standard has to say about utilities reporting broken pipes and it has very little to say. The closest I found is that "The standard error shall be used only for diagnostic messages" and since this broken pipe is completely inconsequential for the correct execution of the script.

So the diagnostic here is only noise, and I believe we at least agree about that, so to silence the noise I simply need to silence the standard error:

-echo "$OUTPUT"
+echo "$OUTPUT" 2>/dev/null

It will be very hard to convince me to fix this otherwise.

ghostinushanka commented 4 years ago

I can understand using awk due to the reasons you've mentioned and to not using gnu extensions for portability reasons (although what portability issues are we talking about in the package for RH/Debian based systems in this case?). What I do not understand is that you percieve "hiding the fact that the error had happened" to be a better solution to "making sure the broken pipe does not occur".

Anyhow, I suggest to let awk process the input fully then, instead of hiding the error, along with potential other error outputs that might get there which will be silenced, like the last change you did in 669c6fe.

find_vcl_file() {
     varnishadm vcl.show -v "$VCL_NAME" |
-    awk '$1 == "//" && $2 == "VCL.SHOW" {print; exit}' | {
+    awk '$1 == "//" && $2 == "VCL.SHOW" {print; nextfile}' | {
         # all this ceremony to handle blanks in FILE
         read -r DELIM VCL_SHOW INDEX SIZE FILE &&
         echo "$FILE"

From the awk man page nextfile # skip rest of this file, open next, start at top

gquintard commented 4 years ago

@ghostinushanka, I think this is just optics at this point: a broken point is not a big deal and not an error per se, so I can understand why @Dridi doesn't want to bend over backwards to prevent it.

Personally, I prefer to break a pipe than having to unnecessarily read the whole input, specially if it's big.

The message from the shell is annoying though, and we should get rid on it. But instead of fixing/silencing each and every possible occurrence, can't we just have

trap -- '' PIPE

and be done with it?

ghostinushanka commented 4 years ago

You are correct about needlessly processing the big files, and since we did hit the error in the first place, meaning that our joint VCL is actually bigger than the default buffer, I can state that our VCL is HUGE. Yet we have opted to process the VCL completely as a better solution to other alternatives, and this is why 'sed' was proposed in the first place.

As for awk change from my previous comment:

This time from the GNU manual

The nextfile statement is useful when there are many data files to process but it isn’t necessary to process every record in every file. Without nextfile, in order to move on to the next data file, a program would have to continue scanning the unwanted records. The nextfile statement accomplishes this much more efficiently.

By "efficiently" I would assume that upon reading the nextfile statement, the awk would simply seek to the end without reading and processing the lines, but I would have to check the source to be sure about it. If this is the case, then the overall VCL config should be tenths of megabytes in size to start making any speed difference. And this is one of the things that is also easily measurable. I think I have explained the decision logic behind originally proposed patch fully at this point.

dridi commented 4 years ago

although what portability issues are we talking about in the package for RH/Debian based systems in this case?

This script is meant to also work on FreeBSD and SunOS in addition to GNU/Linux systems and possibly also non-GNU Linux systems like Alpine. (@gquintard did you try varnishreload on Alpine?)

Before editing your message you mentioned that your nextfile suggestion works on both awk, gawk and other variants. However, this is not a guarantee that your awk script is portable, at least not on my system:

$ ls -l /usr/bin/*awk
lrwxrwxrwx. 1 root root      4 Jul 25 03:59 /usr/bin/awk -> gawk
-rwxr-xr-x. 1 root root 719888 Jul 25 03:59 /usr/bin/gawk

I checked the POSIX manual again and found no mention of the nextfile statement. The gawk manual claims that it was accepted in POSIX in 2012 but it didn't make it in the actual POSIX 2018 edition.

Out of curiosity I also checked the POSIX manual for sed and there is no such sed -E option. This is a show stopper for portability.

Personally, I prefer to break a pipe than having to unnecessarily read the whole input, specially if it's big.

Especially when we are dealing with large VCL files, when the file name is expected to be found very early.

can't we just have

trap -- '' PIPE

and be done with it?

I'm not sure I want to hide all SIGPIPE signals under the rug, only for the output of varnishadm. Also, since this happens in a pipeline, I'd need to check whether the trap is inherited by forks of the shell process and if not whether it is possible to enable it, and if so whether it is portable. I don't mind shaving shell yaks but I must put a limit somewhere :)

@ghostinushanka why do you think draining the pipe is better than ignoring it when we know for a fact that it doesn't matter for the normal execution of the script?

We have a cheap fix, and in the process I fixed one bug and simplified one function. 3 improvements for 1 github ticket, in my book that's a win. Again, thank you and @ooprala for bringing this to our attention, and thank you for providing a reproducer environment.

ghostinushanka commented 4 years ago

Since you've mentioned sed, yes I know we used the GNU extensions, and it was documented in the commets. The only reason to use the extended regex option is to have regex easier to read. Here is the version which works in both gnu-sed and bsd-sed and does not use any of the extensions.

# Store the lines starting with // VCL.SHOW in hold space
/^\/\/ VCL.SHOW/ {
    G
    h
}
# Replace the contents of pattern space of the last line
# with contents of the hold space and print the filename
# from the last (which was first read) occurence
${
    g
    s/.* [0-9]* [0-9]* \(.*\)\n/\1/p
}
> cat vcl-example.vcl
Text
// VCL.SHOW 0 1578 file with 3 spaces.vcl
More text
// VCL.SHOW 0 1578 file.vcl
Even more text
// VCL.SHOW 0 1578 file with TWOspaces.vcl
Final text
> sed -n -f vcl-processor.sed vcl-example.vcl
file with 3 spaces.vcl

why do you think draining the pipe is better than ignoring it when we know for a fact that it doesn't matter for the normal execution of the script?

We know for a fact that in this specific case there is no issue to the overall run, because the SIGPIPE is caused by the how the script is designed. If however there will be another instance of SIGPIPE (or any other error that is, since you dump all to /dev/null) caused by something different, muting the output will ultimately hide it and make debugging much harder.

Especially when we are dealing with large VCL files, when the file name is expected to be found very early.

I've been thinking, are there cases when the file name is actually not on the very first line of output from the varnishadm vcl.show -v? Because if there are none such, why not amend the varnishadm to print out directly the target filename instead of black magic in the control script?

dridi commented 4 years ago

The problem is that this script lives out of the varnish source tree, so if the output changes and adds a preamble in the future, we need to wait for the first file marker. If the file marker format changes, all bets are off...

We also know for a fact that any use of the varnishadm output is meant for extracting information, so unless the varnishadm command failed (and the output is forwarded to stderr) we know that downstream usage of the output is under control.

If you had a ton of loaded VCLs, which we have seen on occasion (and the reason why varnishreload -m exists), you could also run into a broken pipe when you are looking for the active VCL or a given label. Technically those functions would also benefit from exiting once they found what they were looking for, and a broken pipe would also not matter in this context (I will probably do that too).

Also, I'm a big fan of sed and bash and when portability is not a problem I will happily use GNU extensions. See my git sed command for example.

But between the sed script:

# Store the lines starting with // VCL.SHOW in hold space
/^\/\/ VCL.SHOW/ {
    G
    h
}
# Replace the contents of pattern space of the last line
# with contents of the hold space and print the filename
# from the last (which was first read) occurence
${
    g
    s/.* [0-9]* [0-9]* \(.*\)\n/\1/p
}

And the awk stunt:

awk '$1 == "//" && $2 == "VCL.SHOW" {print; exit}' | {
    # all this ceremony to handle blanks in FILE
    read -r DELIM VCL_SHOW INDEX SIZE FILE &&
    echo "$FILE"
}

I find the latter easier to follow and a bit more self-describing.

ghostinushanka commented 4 years ago

While I do not necessarily agree with the 'self-describing' part as the reader either has the knowledge of what commands do, or has to check the manual, I do agree that the sed syntax is elusive for most people. Taking hold of POSIX awk spec I've prepared a simple awk script that does the same as the sed one (produces the filename). And tested it against mawk (1.3.3, 1.3.4), bsd-awk (20070501) and gawk (5.0.1). IMO, compared to sed anyone with basic programming knowledge will have at least a rough idea what it does and how.

BEGIN {
    out=""
}
/^\/\/ VCL.SHOW/ {
    out = $0 out
}
END {
    sub(/.*\/\/ VCL.SHOW [0-9]* [0-9]* /, "", out)
    print out
}

And as I've mentioned to @gquintard I did measurements of this awk script (mawk 1.3.3; E3-1275 cpu) for 20M, 200M and 2G filesizes:

# time awk -f vcl-processor.awk examples/mega.vcl
fake file name 1.vcl

real    0m0.012s
user    0m0.008s
sys 0m0.004s
# time awk -f vcl-processor.awk examples/mega2.vcl
fake file name 1.vcl

real    0m0.094s
user    0m0.037s
sys 0m0.057s
# time awk -f vcl-processor.awk examples/giga.vcl
fake file name 1.vcl

real    0m1.296s
user    0m0.648s
sys 0m0.648s

The slowdown of ~100msec is visible for VCL size of 200Megabytes, 2Gig is just for reference. Is there anyone with the VLC larger than 100megabytes to actually care about processing of whole files?