cmd/preguide: decide on more scaleable approach to sanitisers

myitcv commented 4 years ago

We currently have a handful of command-oriented sanitisers hard-coded into the preguide project. For example, one sanitises the output from go test in order to normalise timings from:

$ go test ./cmd/preguide
ok      github.com/play-with-go/preguide/cmd/preguide   14.870s

to:

$ go test ./cmd/preguide
ok      github.com/play-with-go/preguide/cmd/preguide   0.042s

However:

this approach of putting all sanitisers into this project is clearly not scaleable
it doesn't allow a step to define what sanitisers should be applied (more generally of course it might be desirable to configure a default set of sanitisers for a series of guides, of a default set for a given scenario etc)

Writing sanitisers in CUE feels wrong. Writing them in Go feels better. The question then is how to extend preguide in this "pluggable" way.

Ideally we would want to be able to specify, ultimately at the step level, what sanitisers to apply. Much like the existing approach, we would have a derive step that, for a given statement, determines which sanitisers to apply, from the superset specified for that step. If we assume for one second we will go the Go route, then that specification can take the form of a list of references to a type via its import path:

Steps: use_module: en: preguide.#Command & {
    Sanitisers: ["example.com/blah.GoGetFixer"]
    Source: """
mkdir \(Defs.mod2)
cd \(Defs.mod2)
go mod init mod.com
go get play-with-go.dev/userguides/{{.REPO1}}
go run play-with-go.dev/userguides/{{.REPO1}}
"""
}

Note that the list of sanitisers to apply will always be applied after the sanitising of variables. For example, if we had the following actual output:

$ go get play-with-go.dev/userguides/mod1abcde12345
go: downloading play-with-go.dev/userguides/mod1abcde12345 v0.0.0-20200901194510-cc2d21bd1e55

where mod1abcde12345 is a prestep-generated repository name addressable via the variable `REPO1, then the per-step sanitisers would be applied to:

$ go get play-with-go.dev/userguides/{{.REPO1}}
go: downloading play-with-go.dev/userguides/{{.REPO1}} v0.0.0-20200901194510-cc2d21bd1e55

In the example step, example.com/blah.GoGetFixer is a reference to a function with the following signature that is used to derive a possibly-nil sanitiser.

import "mvdan.cc/sh/v3/syntax"

func(stmt *syntax.Stmt) func(envVars []string, output string) string

envVars is the list of variable name templates that resulted from the sanitisation of variables. In the case of the example above that list would be ["{{.REPO1}}"], and we would want a function, say, GoGetFixer to return a sanitiser that normalises the versions to something like:

$ go get play-with-go.dev/userguides/{{.REPO1}}
go: downloading play-with-go.dev/userguides/{{.REPO1}} v0.0.0-20060102150405-abcde12345

(the date and time here being the time package format reference, the commit sha being a well-defined constant for commits).

At this point we ask ourselves the question: do we just want a func(envVars []string, output string) string sanitiser, or do we optimise for the fact that many sanitisers will be line-based and also support func(envVars []string, output []string) []string?

A couple of options implementation-wise:

have preguide recompile and exec "itself" as it discovers it is missing sanitiser implementations
have the guide author write a main shim around a preguide implementation package, which includes a map from string sanitiser function references to the actual function

The second option is considerably simpler in implementation terms but slightly harder from a usage perspective (as the guide author writing a custom sanitiser), the first is simpler from a user perspective but considerably more complex from an implementation perspective.

For now we will continue to develop sanitisers in this repository, until such time as a pattern/winning solution clearly presents itself.

myitcv commented 4 years ago

Partially addressed in https://github.com/play-with-go/preguide/commit/9c18dd592b9be23f65018e13934359b1cfdb07dc

Because if it is possible to run a command to tell us what part of the output is random, then that is preferable to trying to guess from command output. e.g. go get is particularly tricky, because we potentially see lots of pseudo versions in various commands, and the sanitise step would need to be parameterised in some way to say which pseudo version should be sanitised (i.e. would need the module path before it... but the pseudo version might not always be preceded by the module path - it gets messy).

go test is, however, still a command that needs us to sanitise the output. Which is fine, because the output is well structured.

myitcv commented 2 years ago

Further thinking here in the context of go test. The main problem with go test is that timings can vary between runs:

--- a/_posts/2018-10-19-go-fundamentals_go115_en.markdown
+++ b/_posts/2018-10-19-go-fundamentals_go115_en.markdown
@@ -43,7 +43,7 @@ You should already have completed:
 This guide is running using:

 <pre data-command-src="Z28gdmVyc2lvbgo="><code class="language-.term1">$ go version
-go version go1.15.8 linux/amd64
+go version go1.15.15 linux/amd64
 </code></pre>

 ### Create a module that others can use
@@ -760,7 +760,7 @@ $ go test -v
 === RUN   TestHelloEmpty
 --- PASS: TestHelloEmpty (0.00s)
 PASS
-ok     &#123;&#123;&#123;.GREETINGS&#125;&#125;&#125;  0.002s
+ok     &#123;&#123;&#123;.GREETINGS&#125;&#125;&#125;  0.001s
 </code></pre>

 You will now break the `greetings.Hello` function to view a failing test. The `TestHelloName` test function
@@ -845,7 +845,7 @@ a lot of tests. The `TestHelloName` test should fail -- `TestHelloEmpty` still p
     greetings_test.go:15: Hello(&#34;Gladys&#34;) = &#34;Hail, %v! Well met!&#34;, &lt;nil&gt;, want match for `\bGladys\b`, &lt;nil&gt;
 FAIL
 exit status 1
-FAIL   &#123;&#123;&#123;.GREETINGS&#125;&#125;&#125;  0.002s
+FAIL   &#123;&#123;&#123;.GREETINGS&#125;&#125;&#125;  0.001s
 </code></pre>

 Let's restore `greetings.Hello` to a working state

Other than these timing differences, there is no difference between these two outputs.

One way to achieve this "semantic" diff would be to search for instances of timings like 0.002s at the end of a line, and replace them unique strings. If we did this in both the reference and the actual output, we would then have a semantic diff modulo timings.

However, there might be false positives in some test output.

So whilst it would be a sensible default behaviour, we would need to allow overriding in specific steps.

One way of achieving this would be to have a preguide defined template for commands that start with go test default to a single element list of patterns, that specify line-oriented patterns that should be replaced with unique strings prior to comparison. This would then allow guide authors to override the default pattern list.

myitcv commented 2 years ago

One other thing we need to consider here is how to minimise diffs after re-running guide generation. Indeed, in the case of there being zero semantic difference, there should be no diff for a given guide. Given the way things are currently implemented, this might be hard: we have log files and out guides. It might be that, in the case of zero semantic differences, we leave the log in place and the existing guide, modulo updating the checksum if necessary.

myitcv commented 2 years ago

One other point to consider is that for some commands we will want to specify custom regex-based replaced directives to sanitise output. e.g.

--- a/_posts/2020-08-13-installing-go_go115_en.markdown
+++ b/_posts/2020-08-13-installing-go_go115_en.markdown
@@ -36,12 +36,12 @@ Start in your home directory:

 Download the latest version of Go:

-<pre data-command-src="d2dldCAtcSBodHRwczovL2dvbGFuZy5vcmcvZGwvZ28xLjE1LjgubGludXgtYW1kNjQudGFyLmd6Cg=="><code class="language-.term1">$ wget -q https://golang.org/dl/go1.15.8.linux-amd64.tar.gz
+<pre data-command-src="d2dldCAtcSBodHRwczovL2dvbGFuZy5vcmcvZGwvZ28xLjE1LjE1LmxpbnV4LWFybTY0LnRhci5nego="><code class="language-.term1">$ wget -q https://golang.org/dl/go1.15.15.linux-arm64.tar.gz
 </code></pre>

 Extract and install:

-<pre data-command-src="c3VkbyB0YXIgLUMgL3Vzci9sb2NhbCAteHpmIGdvMS4xNS44LmxpbnV4LWFtZDY0LnRhci5nego="><code class="language-.term1">$ sudo tar -C /usr/local -xzf go1.15.8.linux-amd64.tar.gz
+<pre data-command-src="c3VkbyB0YXIgLUMgL3Vzci9sb2NhbCAteHpmIGdvMS4xNS4xNS5saW51eC1hcm02NC50YXIuZ3oK"><code class="language-.term1">$ sudo tar -C /usr/local -xzf go1.15.15.linux-arm64.tar.gz
 </code></pre>

 Add the install target to your profile `PATH`:

play-with-go / preguide

cmd/preguide: decide on more scaleable approach to sanitisers #73