Open myitcv opened 4 years ago
Partially addressed in https://github.com/play-with-go/preguide/commit/9c18dd592b9be23f65018e13934359b1cfdb07dc
Because if it is possible to run a command to tell us what part of the output is random, then that is preferable to trying to guess from command output. e.g. go get
is particularly tricky, because we potentially see lots of pseudo versions in various commands, and the sanitise step would need to be parameterised in some way to say which pseudo version should be sanitised (i.e. would need the module path before it... but the pseudo version might not always be preceded by the module path - it gets messy).
go test
is, however, still a command that needs us to sanitise the output. Which is fine, because the output is well structured.
Further thinking here in the context of go test
. The main problem with go test
is that timings can vary between runs:
--- a/_posts/2018-10-19-go-fundamentals_go115_en.markdown
+++ b/_posts/2018-10-19-go-fundamentals_go115_en.markdown
@@ -43,7 +43,7 @@ You should already have completed:
This guide is running using:
<pre data-command-src="Z28gdmVyc2lvbgo="><code class="language-.term1">$ go version
-go version go1.15.8 linux/amd64
+go version go1.15.15 linux/amd64
</code></pre>
### Create a module that others can use
@@ -760,7 +760,7 @@ $ go test -v
=== RUN TestHelloEmpty
--- PASS: TestHelloEmpty (0.00s)
PASS
-ok {{{.GREETINGS}}} 0.002s
+ok {{{.GREETINGS}}} 0.001s
</code></pre>
You will now break the `greetings.Hello` function to view a failing test. The `TestHelloName` test function
@@ -845,7 +845,7 @@ a lot of tests. The `TestHelloName` test should fail -- `TestHelloEmpty` still p
greetings_test.go:15: Hello("Gladys") = "Hail, %v! Well met!", <nil>, want match for `\bGladys\b`, <nil>
FAIL
exit status 1
-FAIL {{{.GREETINGS}}} 0.002s
+FAIL {{{.GREETINGS}}} 0.001s
</code></pre>
Let's restore `greetings.Hello` to a working state
Other than these timing differences, there is no difference between these two outputs.
One way to achieve this "semantic" diff would be to search for instances of timings like 0.002s
at the end of a line, and replace them unique strings. If we did this in both the reference and the actual output, we would then have a semantic diff modulo timings.
However, there might be false positives in some test output.
So whilst it would be a sensible default behaviour, we would need to allow overriding in specific steps.
One way of achieving this would be to have a preguide
defined template for commands that start with go test
default to a single element list of patterns, that specify line-oriented patterns that should be replaced with unique strings prior to comparison. This would then allow guide authors to override the default pattern list.
One other thing we need to consider here is how to minimise diffs after re-running guide generation. Indeed, in the case of there being zero semantic difference, there should be no diff for a given guide. Given the way things are currently implemented, this might be hard: we have log files and out guides. It might be that, in the case of zero semantic differences, we leave the log in place and the existing guide, modulo updating the checksum if necessary.
One other point to consider is that for some commands we will want to specify custom regex-based replaced directives to sanitise output. e.g.
--- a/_posts/2020-08-13-installing-go_go115_en.markdown
+++ b/_posts/2020-08-13-installing-go_go115_en.markdown
@@ -36,12 +36,12 @@ Start in your home directory:
Download the latest version of Go:
-<pre data-command-src="d2dldCAtcSBodHRwczovL2dvbGFuZy5vcmcvZGwvZ28xLjE1LjgubGludXgtYW1kNjQudGFyLmd6Cg=="><code class="language-.term1">$ wget -q https://golang.org/dl/go1.15.8.linux-amd64.tar.gz
+<pre data-command-src="d2dldCAtcSBodHRwczovL2dvbGFuZy5vcmcvZGwvZ28xLjE1LjE1LmxpbnV4LWFybTY0LnRhci5nego="><code class="language-.term1">$ wget -q https://golang.org/dl/go1.15.15.linux-arm64.tar.gz
</code></pre>
Extract and install:
-<pre data-command-src="c3VkbyB0YXIgLUMgL3Vzci9sb2NhbCAteHpmIGdvMS4xNS44LmxpbnV4LWFtZDY0LnRhci5nego="><code class="language-.term1">$ sudo tar -C /usr/local -xzf go1.15.8.linux-amd64.tar.gz
+<pre data-command-src="c3VkbyB0YXIgLUMgL3Vzci9sb2NhbCAteHpmIGdvMS4xNS4xNS5saW51eC1hcm02NC50YXIuZ3oK"><code class="language-.term1">$ sudo tar -C /usr/local -xzf go1.15.15.linux-arm64.tar.gz
</code></pre>
Add the install target to your profile `PATH`:
We currently have a handful of command-oriented sanitisers hard-coded into the
preguide
project. For example, one sanitises the output fromgo test
in order to normalise timings from:to:
However:
Writing sanitisers in CUE feels wrong. Writing them in Go feels better. The question then is how to extend
preguide
in this "pluggable" way.Ideally we would want to be able to specify, ultimately at the step level, what sanitisers to apply. Much like the existing approach, we would have a derive step that, for a given statement, determines which sanitisers to apply, from the superset specified for that step. If we assume for one second we will go the Go route, then that specification can take the form of a list of references to a type via its import path:
Note that the list of sanitisers to apply will always be applied after the sanitising of variables. For example, if we had the following actual output:
where
mod1abcde12345
is a prestep-generated repository name addressable via the variable `REPO1, then the per-step sanitisers would be applied to:In the example step,
example.com/blah.GoGetFixer
is a reference to a function with the following signature that is used to derive a possibly-nil
sanitiser.envVars
is the list of variable name templates that resulted from the sanitisation of variables. In the case of the example above that list would be["{{.REPO1}}"]
, and we would want a function, say,GoGetFixer
to return a sanitiser that normalises the versions to something like:(the date and time here being the
time
package format reference, the commit sha being a well-defined constant for commits).At this point we ask ourselves the question: do we just want a
func(envVars []string, output string) string
sanitiser, or do we optimise for the fact that many sanitisers will be line-based and also supportfunc(envVars []string, output []string) []string
?A couple of options implementation-wise:
preguide
recompile andexec
"itself" as it discovers it is missing sanitiser implementationsmain
shim around apreguide
implementation package, which includes a map from string sanitiser function references to the actual functionThe second option is considerably simpler in implementation terms but slightly harder from a usage perspective (as the guide author writing a custom sanitiser), the first is simpler from a user perspective but considerably more complex from an implementation perspective.
For now we will continue to develop sanitisers in this repository, until such time as a pattern/winning solution clearly presents itself.