Open thockin opened 3 years ago
Are you using ruleguard binary directly or via the golangci-lint? There is an issue that golagnci-lint makes ruleguard load rule file again and again for every package being checked. This will be fixed in future from the either side, but I'm not 100% sure which side it will be right now. For now, it creates both performance and potential dsl
package dependency issues (as we have to look it up for every package).
It's hard to have it "linked" into the binary as we use normal go/types
and go/ast
, etc for this. They load (import) the packages in the usual way. Go rule files are treated as normal Go code, part of your module.
There is a WIP to make embedding easier, so we can generate the code from the rule file so we don't have to parse/typecheck it ever again apart from the moment when we generate the code. But that would make it impossible to modify the rules without the binary recompilation. Depending on what you want to accomplish, it may or may not solve your issue.
I said "embedding" because that generated data can be copy/paster into the program and used as a ruleguard input, bypassing the parsing+typechecking (well, most of it).
I don't know how to provide these precompiled binaries for the users though. I can ship a separate binary that can build these analyzers into a binary if it sounds alright.
Note that for projects as big as k8s I wouldn't expect a tool to work perfectly well without some amount of hacking. It's interesting to hear about the issues though (like memory consumption, etc).
As a last resort, there is always a way to write your own main file that runs ruleguard in a way you would like it to behave. Strictly speaking, it doesn't need the dsl
package to be a part of your module, it should be just available at the moment we load the rules. Depending on the execution environment (go/analysis
framework, go-critic
or golangci-lint
) it may load it once or multiple times. It's possible to write a main that only loads rules once and then walks through the packages.
tl;dr: there is an option of making a binary that embeds all rules. It will introduce an extra step in the deployment scheme (one needs to build a binary themselves) plus it makes the rules less dynamic (you can't edit them and see the changes right away, but, at the same time, you don't need rules file artifact during the deployment anymore).
Another way would be a conservative usage where you only run a linter on the parts of the k8s project that are well-suited for it, skipping the harder bits.
We are using it through gocritic, through golangci-lint (though IMO it should be a first-order linter in golangci-lint) .
I was thinking about using embedding for the dsl library, too. What I don't want to do is build custom tools and binaries. We just get stuck maintaining these forever. Maybe I should up-level and describe the user-story I think we're missing:
Story 1: I create a rules.go file. I edit my golangci-lint config to enable ruleguard. I run golangci-lint and ruleguard runs. I did not have to depend on or even be aware of the DSL library - that's not my job. Alternately, I run a ruleguard binary and bypass gocritic and golangci-lint.
Story 2: I write a rules.go file. I run a docker image (either golangci-lint or a ruleguard image) and mount my rules.go at a known path, and ruleguard runs. That's all.
It's my intuition that most users of ruleguard don't need to change the DSL
On Sun, Aug 22, 2021 at 3:54 AM Iskander (Alex) Sharipov < @.***> wrote:
Another way would be a conservative usage where you only run a linter on the parts of the k8s project that are well-suited for it, skipping the harder bits.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/quasilyte/go-ruleguard/issues/253#issuecomment-903249874, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWAVFPBHPD5NKGQ3JEXRLT6DJPPANCNFSM5CSDHVPQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .
Or if I am going to use ruleguard directly, maybe a flag that points to the DSL lib (ignoring modules)?
Coming back to this, I have set up a trivial demonstration of the problem(s): https://github.com/thockin/ruleguard-multi-modules
There are errors in both subdir/file.go
and submod/file.go
.
$ golangci-lint run
subdir/file.go:6:2: ruleguard: prefer mynet.ParseIPSloppy() (gocritic)
net.ParseIP("foobar")
^
but
$ golangci-lint run ./submod/
ERRO [linters context] typechecking error: main module (example.com/root) does not contain package example.com/root/submod
I know this is a problem with Go's own handling of multi-repo. Try the usual workaround:
$ cd submod/; golangci-lint run; cd - >/dev/null
No output.
$ cd submod/; golangci-lint -v run; cd - >/dev/null
INFO [config_reader] Config search paths: [./ /home/thockin/src/go/src/github.com/thockin/ruleguard-multi-modules/submod /home/thockin/src/go/src/github.com/thockin/ruleguard-multi-modules /home/thockin/src/go/src/github.com/thockin /home/thockin/src/go/src/github.com /home/thockin/src/go/src /home/thockin/src/go /home/thockin/src /home/thockin /home /]
INFO [config_reader] Used config file ../.golangci.yaml
INFO [lintersdb] Active 1 linters: [gocritic]
INFO [loader] Go packages loading at mode 575 (compiled_files|files|name|deps|exports_file|imports|types_sizes) took 70.889764ms
INFO [runner/filename_unadjuster] Pre-built 0 adjustments in 94.267µs
INFO [linters context/goanalysis] analyzers took 0s with no stages
INFO [runner] processing took 3.34µs with stages: max_same_issues: 551ns, nolint: 523ns, skip_dirs: 257ns, cgo: 241ns, skip_files: 217ns, max_from_linter: 217ns, filename_unadjuster: 212ns, diff: 143ns, path_prettifier: 135ns, identifier_marker: 131ns, uniq_by_line: 130ns, autogenerated_exclude: 129ns, exclude: 124ns, max_per_file_from_linter: 53ns, path_shortener: 51ns, source_code: 50ns, sort_results: 49ns, severity-rules: 44ns, exclude-rules: 42ns, path_prefixer: 41ns
INFO [runner] linters took 16.305039ms with stages: gocritic: 16.238387ms
INFO File cache stats: 0 entries of total size 0B
INFO Memory: 2 samples, avg is 71.7MB, max is 71.8MB
INFO Execution took 92.837517ms
It found the config, but not the rules file. But it didn't complain in any way, making me thing it actually passed, when it didn't. If I then hack go-critic to find the file, it STILL fails the same way - iternally go-critic gets (and golangci-lint swallows the logs)
could not import github.com/quasilyte/go-ruleguard/dsl (can't find import: "github.com/quasilyte/go-ruleguard/dsl")
tl;dr: I don't have a concrete solution for that yet. I see the problem, but it's a combination of both design choices and go/types
+ go/importer
clunkiness that is hard to overcome. There was embedded-dsl mechanism before Go modules, but it broke when Go modules became popular: it affected how GOPATH is interpreted, it made it unclear how and from where sources are loaded (and it also may depend on Go version, GO111MODULES value and so on).
A side note: golangci-lint upgraded go-critic to v0.6.1
which includes updated ruleguard.
DSL package is needed to:
rules.go
file. This step is unnecessary if you're never going to change the rules.go
file, so it's possible to create a custom binary that includes all necessary info.I tried to embed DSL sources in binary but then Go typechecking would fail due to the ways it works. It takes more research to know how we can circumvent that. I tried for a long time, with different approaches. I don't have any more spare mental powers to come back to this in the near future. It's one of the most frustrating things I worked on, this import/dependency thing. For the purposes of go-critic we solved it with precompilation, so rules.go
file can be transformed into something that doesn't require dsl
package. But dsl
package is still needed on a machine that will perform that transformation.
I thought that if it's an explicit Go-like artifact, it's OK to have it as a direct Go dependency. So unlike some other tools that are not deeply integrated into Go, we have something that works like a first-class citizen. Rules are (almost) Go code after all. You're using dsl
Go package to express them. When we use some Go library, it's required as a dependency.
Note that it's not necessary to copy/paste all rules from other sources. It's possible to use rules importing: https://github.com/quasilyte/go-ruleguard/blob/028d6511ab714492b6db752158dc7b2ccca5bbbd/rules.go#L1-L10
I sounds like the piece I was missing was that this uses Go's libs internally and the shortcomings in those libs become shortcomings of this tool. Is that right?
I believe that's the case at least partially, yes.
I got a new starting point on this, but you might not like it.
It seems that ruleguard uses go/build
(the "old way"). Go team recommends everyone switch to golang.org/x/tools/go/packages
since it is more powerful and flexible. That offers an "overlay" concept which would allow you to "fake" the DSL being present.
E.g. if I said ruleguard -dsl=/path/to/dsl/files
you could set up an overlay entry that faked those files being in a proper place to be found.
But first it would have to switch to the new lib. I found it easy to do in one of my own tools, but ruleguard is more complex than mine :)
I was looking towards using packages
, but, if I remember correctly, I had an issue that it doesn't provide the types.Importer
object. I need to "import()" a few things dynamically, for the rules that do m.Import()
explicitely or if they're referencing some stdlib package. The typechecking and that late importing should be performed by the same "importer object", otherwise we'll have incompatible packages like foo is incompatible with foo
(because they have different addresses as a consequence).
This "foo is not foo" issue forced me to forget about using embedded dsl package sources. (There should be a commit somewhere which removes embedded dsl, with some explanations hopefully.)
Maybe there is a way to circumwent that. Or maybe it's also possible to add some hacks kludges to make the current code work as expected (even if it requires copying some code from go/packages
).
I'm afraid that what ruleguard does is too unusual and this is why we're facing some unpleasant limitations and issues.
That being said, I'm interested in solving the issue.
OK. How about a MUCH dirtier trick?
I just PoC'ed this with a forced failure, and no dep on the dsl in my codebase.
diff --git a/ruleguard/engine.go b/ruleguard/engine.go
index e00706c..8335452 100644
--- a/ruleguard/engine.go
+++ b/ruleguard/engine.go
@@ -10,6 +10,7 @@ import (
"io"
"io/ioutil"
"os"
+ "os/exec"
"sort"
"strings"
"sync"
@@ -49,6 +50,16 @@ func (e *engine) Load(ctx *LoadContext, buildContext *build.Context, filename st
if err != nil {
return err
}
+ // BEGIN HACK
+ cwd, _ := os.Getwd()
+ tmp, _ := os.MkdirTemp("", "ruleguard.*")
+ os.Chdir(tmp)
+ defer os.Chdir(cwd)
+ cmd := exec.Command("go", "mod", "init", "ruleguard-tmp")
+ cmd.Run()
+ cmd = exec.Command("go", "get", "-d", "github.com/quasilyte/go-ruleguard/dsl")
+ cmd.Run()
+ // END HACK
imp := newGoImporter(e.state, goImporterConfig{
fset: ctx.Fset,
debugImports: ctx.DebugImports,
This makes a temp dir, initializes a dummy module, and deps on the DSL code. It parses my rules file from that dir, which lets go tooling resolve the DSL code. Then it changes back to the original dir to use the now-loaded DSL code on my to-be-checked code.
Obviously I did no error checking, but it works - it finds the forced error in my code.
I don't know what the best way to inject the DSL files would be. This works but depends on the latest tagged version. Ideally you would do something like:
Maybe move the DSL to a different repo so I could say something like ruleguard -rules rules.go -dsl=latest ./...
and that would do this hack, but it would make sense? Better if it had no need for network (e.g. I could run it on an airplane), IMO.
What do you think overall? Should I have my Go license revoked?
Should I have my Go license revoked?
Ahahah. Definitely not.
What do you think overall?
Ideally, some dsl package would be embedded, as you mentioned, so there is a default version (latest release) available even without go get
. But as I mentioned above, this scheme was not working for some issues that are related to how go/types
handle types equality during the typecheking.
I could try creating this embedded dsl experiment branch so it's easier to show and explain.
You would have to "emit" the embedded files into a form that Go thinks they are real - either as a vendor'ed module or as "local" code (and replace the imports on the fly?)
This PoC is dirty but it works because it just tries to meet Go's expectations. We don't need the rules.go and the code-to-be-checked in the same parse universe.
Ooof. I really hoped that it would be possible to avoid doing some write-like fs operations.
Modifying the go.mod file in place looks unfortunate too.
I wonder if we can get a piece of advice from some other experienced Go tool developers (someone who is proficient in go/analysis
+ other go/*
packages?). Another view on this problem could be useful.
Some wild ideas inspired by @storozhukBM.
go/types
to typecheck rule files (ones that import dsl). Instead, do checking in ad-hoc way. The API surface is not that complicated, it should be possible. This way, we don't really need to lookup the DSL package sources for the most cases (unless the user wants to use some different DSL version from the one that is bundled with the ruleguard).go/types
, but don't run it on the entire rules file. Do a preprocessing that will rewrite external types (from dsl) to a local types (using the embedded sources information) and then run a typechecking on a file that doesn't have any import statements.go/types
that exports 1-2 extra functions that would make it possible to overcome the limitations. We could also use go:linkname
, but that would add more unsafe package usages to the project, which would be unfortunate.This exact issue blocks us from introdusing custom ruleguard
rules (through golangci-lint + goritic). We maintain a centralized Docker image with golangci-lint and its configuration, that the org sees as "company's golden standard". All Go projects (lots of independent repositories) are required to have a CI check, that runs lint the codebase via this Docker image, in order to be compliant with the standard. It's fairly hard and confusing for the maintainers of these repositories, to have go-ruleguard/dsl
as a direct dependency.
Since this issues is a couple years old now, I wonder if there is already a well-known path overcome the problem?
When Go libraries like go/packages
or go/analysis
"load" the code, they want to find the dependencies inside the context. In modern Go, that context is a current module.
Since DSL is Go and it uses the definitions from some package, that package needs to be available during the analysis.
I thought that it was a neat thing: if you use a package to describe your rules, you might as well have it versioned (because DSL evolves over time, etc). But it means that in order to use it, a dependency must be installed. There is just no API in Go tool packages (like go/analysis
) to do anything too fancy. I can't even embed the entire DSL package and then treat it as a real package. It will cause the typechecker to complain.
It is possible to avoid typechecking the DSL code at the expense of parsing gibberish and then have no any real error message that would tell you what went wrong. I wanted to make these files a proper Go code where you can't pass string as int and so on. It's not a practical thing for me to re-invent a typechecker just for that in my opinion.
This issue is way more complicated than I can swallow with ~zero rewards for this project. I moved forward to some other things that at least give me something back (and I'm not talking about the money).
The Go team doesn't care about projects like ruleguard and its nature is quite alien to the way Go packages like go/analysis
were designed. Heck, they don't even have a safe way to tell types.Sizes.SizeOf()
without panicking because of random generic-types. It was much-much easier for the tool makers before the Go code resolving became this hard. I remember the times when we used go/types
+ go/ast
+ go/parser
and everything worked just fine (and 5-10 times faster). We just needed to traverse the GOROOT
correctly. Now it's a complete disaster. go/loader
came and became deprecated. Then it was go/packages
, which was OK, but now we have go/analysis
. After generics, everything broke again. Meh.
I could solve this issue if I has like 1-2 months dedicated to make it work. Unless someone can help me out, I'm afraid this issue can stay there forever.
In other words, if this issue is just about the users of the tool being confused due to an extra dependency in their go.mod
, it's not engaging enough for me. Unfortunately, these users exist out there and enjoy the things that just work. When things are not working as they would expect, then they come and ask for the changes (you get some rewards in return: nothing).
So, unless the issue is a blocker for the intended usage of this tool (the things I can actually test for and handle with my own hands), it's a no.
Maybe https://go.googlesource.com/proposal/+/refs/changes/55/495555/5/design/48429-go-tool-modules.md could make some things easier if we can make dsl to be a part of "tool" dependencies.
I admit that Kubernetes is not exactly "normal" in its repo setup, but it is VALID.
We have one repo with multiple go modules in various directories, but we treat it as a single codebase for the purpose of things like lint. It seems that github.com/quasilyte/go-ruleguard/dsl has to be a dependency of every module. It would be VASTLY simpler to just have it linked into the binary.
Is there a better answer?