quasilyte / go-ruleguard

Define and run pattern-based custom linting rules.
https://go-ruleguard.github.io/
BSD 3-Clause "New" or "Revised" License
795 stars 42 forks source link

The need to have the DSL at runtime breaks multi-module repos #253

Open thockin opened 3 years ago

thockin commented 3 years ago

I admit that Kubernetes is not exactly "normal" in its repo setup, but it is VALID.

We have one repo with multiple go modules in various directories, but we treat it as a single codebase for the purpose of things like lint. It seems that github.com/quasilyte/go-ruleguard/dsl has to be a dependency of every module. It would be VASTLY simpler to just have it linked into the binary.

Is there a better answer?

quasilyte commented 3 years ago

Are you using ruleguard binary directly or via the golangci-lint? There is an issue that golagnci-lint makes ruleguard load rule file again and again for every package being checked. This will be fixed in future from the either side, but I'm not 100% sure which side it will be right now. For now, it creates both performance and potential dsl package dependency issues (as we have to look it up for every package).

It's hard to have it "linked" into the binary as we use normal go/types and go/ast, etc for this. They load (import) the packages in the usual way. Go rule files are treated as normal Go code, part of your module.

There is a WIP to make embedding easier, so we can generate the code from the rule file so we don't have to parse/typecheck it ever again apart from the moment when we generate the code. But that would make it impossible to modify the rules without the binary recompilation. Depending on what you want to accomplish, it may or may not solve your issue.

I said "embedding" because that generated data can be copy/paster into the program and used as a ruleguard input, bypassing the parsing+typechecking (well, most of it).

I don't know how to provide these precompiled binaries for the users though. I can ship a separate binary that can build these analyzers into a binary if it sounds alright.

Note that for projects as big as k8s I wouldn't expect a tool to work perfectly well without some amount of hacking. It's interesting to hear about the issues though (like memory consumption, etc).

As a last resort, there is always a way to write your own main file that runs ruleguard in a way you would like it to behave. Strictly speaking, it doesn't need the dsl package to be a part of your module, it should be just available at the moment we load the rules. Depending on the execution environment (go/analysis framework, go-critic or golangci-lint) it may load it once or multiple times. It's possible to write a main that only loads rules once and then walks through the packages.

quasilyte commented 3 years ago

tl;dr: there is an option of making a binary that embeds all rules. It will introduce an extra step in the deployment scheme (one needs to build a binary themselves) plus it makes the rules less dynamic (you can't edit them and see the changes right away, but, at the same time, you don't need rules file artifact during the deployment anymore).

quasilyte commented 3 years ago

Another way would be a conservative usage where you only run a linter on the parts of the k8s project that are well-suited for it, skipping the harder bits.

thockin commented 3 years ago

We are using it through gocritic, through golangci-lint (though IMO it should be a first-order linter in golangci-lint) .

I was thinking about using embedding for the dsl library, too. What I don't want to do is build custom tools and binaries. We just get stuck maintaining these forever. Maybe I should up-level and describe the user-story I think we're missing:

Story 1: I create a rules.go file. I edit my golangci-lint config to enable ruleguard. I run golangci-lint and ruleguard runs. I did not have to depend on or even be aware of the DSL library - that's not my job. Alternately, I run a ruleguard binary and bypass gocritic and golangci-lint.

Story 2: I write a rules.go file. I run a docker image (either golangci-lint or a ruleguard image) and mount my rules.go at a known path, and ruleguard runs. That's all.

It's my intuition that most users of ruleguard don't need to change the DSL

On Sun, Aug 22, 2021 at 3:54 AM Iskander (Alex) Sharipov < @.***> wrote:

Another way would be a conservative usage where you only run a linter on the parts of the k8s project that are well-suited for it, skipping the harder bits.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/quasilyte/go-ruleguard/issues/253#issuecomment-903249874, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWAVFPBHPD5NKGQ3JEXRLT6DJPPANCNFSM5CSDHVPQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

thockin commented 3 years ago

Or if I am going to use ruleguard directly, maybe a flag that points to the DSL lib (ignoring modules)?

thockin commented 3 years ago

Coming back to this, I have set up a trivial demonstration of the problem(s): https://github.com/thockin/ruleguard-multi-modules

There are errors in both subdir/file.go and submod/file.go.

$ golangci-lint run
subdir/file.go:6:2: ruleguard: prefer mynet.ParseIPSloppy() (gocritic)
    net.ParseIP("foobar")
    ^

but

$ golangci-lint run ./submod/
ERRO [linters context] typechecking error: main module (example.com/root) does not contain package example.com/root/submod

I know this is a problem with Go's own handling of multi-repo. Try the usual workaround:

$ cd submod/; golangci-lint run; cd - >/dev/null

No output.

$ cd submod/; golangci-lint -v  run; cd - >/dev/null
INFO [config_reader] Config search paths: [./ /home/thockin/src/go/src/github.com/thockin/ruleguard-multi-modules/submod /home/thockin/src/go/src/github.com/thockin/ruleguard-multi-modules /home/thockin/src/go/src/github.com/thockin /home/thockin/src/go/src/github.com /home/thockin/src/go/src /home/thockin/src/go /home/thockin/src /home/thockin /home /] 
INFO [config_reader] Used config file ../.golangci.yaml 
INFO [lintersdb] Active 1 linters: [gocritic]     
INFO [loader] Go packages loading at mode 575 (compiled_files|files|name|deps|exports_file|imports|types_sizes) took 70.889764ms 
INFO [runner/filename_unadjuster] Pre-built 0 adjustments in 94.267µs 
INFO [linters context/goanalysis] analyzers took 0s with no stages 
INFO [runner] processing took 3.34µs with stages: max_same_issues: 551ns, nolint: 523ns, skip_dirs: 257ns, cgo: 241ns, skip_files: 217ns, max_from_linter: 217ns, filename_unadjuster: 212ns, diff: 143ns, path_prettifier: 135ns, identifier_marker: 131ns, uniq_by_line: 130ns, autogenerated_exclude: 129ns, exclude: 124ns, max_per_file_from_linter: 53ns, path_shortener: 51ns, source_code: 50ns, sort_results: 49ns, severity-rules: 44ns, exclude-rules: 42ns, path_prefixer: 41ns 
INFO [runner] linters took 16.305039ms with stages: gocritic: 16.238387ms 
INFO File cache stats: 0 entries of total size 0B 
INFO Memory: 2 samples, avg is 71.7MB, max is 71.8MB 
INFO Execution took 92.837517ms                   

It found the config, but not the rules file. But it didn't complain in any way, making me thing it actually passed, when it didn't. If I then hack go-critic to find the file, it STILL fails the same way - iternally go-critic gets (and golangci-lint swallows the logs) could not import github.com/quasilyte/go-ruleguard/dsl (can't find import: "github.com/quasilyte/go-ruleguard/dsl")

quasilyte commented 3 years ago

tl;dr: I don't have a concrete solution for that yet. I see the problem, but it's a combination of both design choices and go/types + go/importer clunkiness that is hard to overcome. There was embedded-dsl mechanism before Go modules, but it broke when Go modules became popular: it affected how GOPATH is interpreted, it made it unclear how and from where sources are loaded (and it also may depend on Go version, GO111MODULES value and so on).

A side note: golangci-lint upgraded go-critic to v0.6.1 which includes updated ruleguard.


DSL package is needed to:

  1. typecheck and resolve the rules.go file. This step is unnecessary if you're never going to change the rules.go file, so it's possible to create a custom binary that includes all necessary info.
  2. To provide the editing experience. It allows go-to-definition, autocomplete and generally you can edit it as an ordinary Go file. Since it tries to look like an ordinary Go file, it requires DSL as a dependency.

I tried to embed DSL sources in binary but then Go typechecking would fail due to the ways it works. It takes more research to know how we can circumvent that. I tried for a long time, with different approaches. I don't have any more spare mental powers to come back to this in the near future. It's one of the most frustrating things I worked on, this import/dependency thing. For the purposes of go-critic we solved it with precompilation, so rules.go file can be transformed into something that doesn't require dsl package. But dsl package is still needed on a machine that will perform that transformation.

I thought that if it's an explicit Go-like artifact, it's OK to have it as a direct Go dependency. So unlike some other tools that are not deeply integrated into Go, we have something that works like a first-class citizen. Rules are (almost) Go code after all. You're using dsl Go package to express them. When we use some Go library, it's required as a dependency.

Note that it's not necessary to copy/paste all rules from other sources. It's possible to use rules importing: https://github.com/quasilyte/go-ruleguard/blob/028d6511ab714492b6db752158dc7b2ccca5bbbd/rules.go#L1-L10

thockin commented 3 years ago

I sounds like the piece I was missing was that this uses Go's libs internally and the shortcomings in those libs become shortcomings of this tool. Is that right?

quasilyte commented 3 years ago

I believe that's the case at least partially, yes.

thockin commented 2 years ago

I got a new starting point on this, but you might not like it.

It seems that ruleguard uses go/build (the "old way"). Go team recommends everyone switch to golang.org/x/tools/go/packages since it is more powerful and flexible. That offers an "overlay" concept which would allow you to "fake" the DSL being present.

E.g. if I said ruleguard -dsl=/path/to/dsl/files you could set up an overlay entry that faked those files being in a proper place to be found.

But first it would have to switch to the new lib. I found it easy to do in one of my own tools, but ruleguard is more complex than mine :)

quasilyte commented 2 years ago

I was looking towards using packages, but, if I remember correctly, I had an issue that it doesn't provide the types.Importer object. I need to "import()" a few things dynamically, for the rules that do m.Import() explicitely or if they're referencing some stdlib package. The typechecking and that late importing should be performed by the same "importer object", otherwise we'll have incompatible packages like foo is incompatible with foo (because they have different addresses as a consequence).

This "foo is not foo" issue forced me to forget about using embedded dsl package sources. (There should be a commit somewhere which removes embedded dsl, with some explanations hopefully.)

Maybe there is a way to circumwent that. Or maybe it's also possible to add some hacks kludges to make the current code work as expected (even if it requires copying some code from go/packages).

I'm afraid that what ruleguard does is too unusual and this is why we're facing some unpleasant limitations and issues.

That being said, I'm interested in solving the issue.

thockin commented 2 years ago

OK. How about a MUCH dirtier trick?

I just PoC'ed this with a forced failure, and no dep on the dsl in my codebase.

diff --git a/ruleguard/engine.go b/ruleguard/engine.go
index e00706c..8335452 100644
--- a/ruleguard/engine.go
+++ b/ruleguard/engine.go
@@ -10,6 +10,7 @@ import (
        "io"
        "io/ioutil"
        "os"
+       "os/exec"
        "sort"
        "strings"
        "sync"
@@ -49,6 +50,16 @@ func (e *engine) Load(ctx *LoadContext, buildContext *build.Context, filename st
        if err != nil {
                return err
        }
+       // BEGIN HACK
+       cwd, _ := os.Getwd()
+       tmp, _ := os.MkdirTemp("", "ruleguard.*")
+       os.Chdir(tmp)
+       defer os.Chdir(cwd)
+       cmd := exec.Command("go", "mod", "init", "ruleguard-tmp")
+       cmd.Run()
+       cmd = exec.Command("go", "get", "-d", "github.com/quasilyte/go-ruleguard/dsl")
+       cmd.Run()
+       // END HACK
        imp := newGoImporter(e.state, goImporterConfig{
                fset:         ctx.Fset,
                debugImports: ctx.DebugImports,

This makes a temp dir, initializes a dummy module, and deps on the DSL code. It parses my rules file from that dir, which lets go tooling resolve the DSL code. Then it changes back to the original dir to use the now-loaded DSL code on my to-be-checked code.

Obviously I did no error checking, but it works - it finds the forced error in my code.

I don't know what the best way to inject the DSL files would be. This works but depends on the latest tagged version. Ideally you would do something like:

Maybe move the DSL to a different repo so I could say something like ruleguard -rules rules.go -dsl=latest ./... and that would do this hack, but it would make sense? Better if it had no need for network (e.g. I could run it on an airplane), IMO.

What do you think overall? Should I have my Go license revoked?

quasilyte commented 2 years ago

Should I have my Go license revoked?

Ahahah. Definitely not.

What do you think overall?

Ideally, some dsl package would be embedded, as you mentioned, so there is a default version (latest release) available even without go get. But as I mentioned above, this scheme was not working for some issues that are related to how go/types handle types equality during the typecheking.

I could try creating this embedded dsl experiment branch so it's easier to show and explain.

thockin commented 2 years ago

You would have to "emit" the embedded files into a form that Go thinks they are real - either as a vendor'ed module or as "local" code (and replace the imports on the fly?)

This PoC is dirty but it works because it just tries to meet Go's expectations. We don't need the rules.go and the code-to-be-checked in the same parse universe.

quasilyte commented 2 years ago

Ooof. I really hoped that it would be possible to avoid doing some write-like fs operations. Modifying the go.mod file in place looks unfortunate too. I wonder if we can get a piece of advice from some other experienced Go tool developers (someone who is proficient in go/analysis + other go/* packages?). Another view on this problem could be useful.

quasilyte commented 2 years ago

Some wild ideas inspired by @storozhukBM.

narqo commented 1 year ago

This exact issue blocks us from introdusing custom ruleguard rules (through golangci-lint + goritic). We maintain a centralized Docker image with golangci-lint and its configuration, that the org sees as "company's golden standard". All Go projects (lots of independent repositories) are required to have a CI check, that runs lint the codebase via this Docker image, in order to be compliant with the standard. It's fairly hard and confusing for the maintainers of these repositories, to have go-ruleguard/dsl as a direct dependency.

Since this issues is a couple years old now, I wonder if there is already a well-known path overcome the problem?

quasilyte commented 1 year ago

When Go libraries like go/packages or go/analysis "load" the code, they want to find the dependencies inside the context. In modern Go, that context is a current module.

Since DSL is Go and it uses the definitions from some package, that package needs to be available during the analysis. I thought that it was a neat thing: if you use a package to describe your rules, you might as well have it versioned (because DSL evolves over time, etc). But it means that in order to use it, a dependency must be installed. There is just no API in Go tool packages (like go/analysis) to do anything too fancy. I can't even embed the entire DSL package and then treat it as a real package. It will cause the typechecker to complain.

It is possible to avoid typechecking the DSL code at the expense of parsing gibberish and then have no any real error message that would tell you what went wrong. I wanted to make these files a proper Go code where you can't pass string as int and so on. It's not a practical thing for me to re-invent a typechecker just for that in my opinion.

This issue is way more complicated than I can swallow with ~zero rewards for this project. I moved forward to some other things that at least give me something back (and I'm not talking about the money).

The Go team doesn't care about projects like ruleguard and its nature is quite alien to the way Go packages like go/analysis were designed. Heck, they don't even have a safe way to tell types.Sizes.SizeOf() without panicking because of random generic-types. It was much-much easier for the tool makers before the Go code resolving became this hard. I remember the times when we used go/types + go/ast + go/parser and everything worked just fine (and 5-10 times faster). We just needed to traverse the GOROOT correctly. Now it's a complete disaster. go/loader came and became deprecated. Then it was go/packages, which was OK, but now we have go/analysis. After generics, everything broke again. Meh.

I could solve this issue if I has like 1-2 months dedicated to make it work. Unless someone can help me out, I'm afraid this issue can stay there forever.

quasilyte commented 1 year ago

In other words, if this issue is just about the users of the tool being confused due to an extra dependency in their go.mod, it's not engaging enough for me. Unfortunately, these users exist out there and enjoy the things that just work. When things are not working as they would expect, then they come and ask for the changes (you get some rewards in return: nothing).

So, unless the issue is a blocker for the intended usage of this tool (the things I can actually test for and handle with my own hands), it's a no.

quasilyte commented 1 year ago

Maybe https://go.googlesource.com/proposal/+/refs/changes/55/495555/5/design/48429-go-tool-modules.md could make some things easier if we can make dsl to be a part of "tool" dependencies.