Open sbarzowski opened 3 years ago
I'm running into this issue where I'm trying to lint a large amount of files and it is so slow that it is unpleasant to integrate in my development flow.
I noticed using xargs with parallelism is way faster than passing all the files in one jsonnet-lint invocation.
There's definitely low hanging fruit for optimization here. I noticed in the code it is looping over every input file so this could at least by parallelized by the tool itself.
In the mean time, here is my ugly command:
find -name '*.libsonnet' ! -path './vendor/*' -print0 | xargs -0 -t -I% -P$(nproc) jsonnet-lint -J vendor %
When running all tests together, all root nodes are collected for all nodes at first. Then when evaluating each node individually, the type check mechanism takes all roots into account, although they are mostly unnecessary.
Moving the part of collecting the root nodes into the main evaluating loop, results in the multiple tests running as fast as expected (0.24s in my case)
diff --git a/linter/linter.go b/linter/linter.go
index 1437ba1..f99768f 100644
--- a/linter/linter.go
+++ b/linter/linter.go
@@ -45,14 +45,6 @@ type nodeWithLocation struct {
// Lint analyses a node and reports any issues it encounters to an error writer.
func lint(vm *jsonnet.VM, nodes []nodeWithLocation, errWriter *ErrorWriter) {
- roots := make(map[string]ast.Node)
- for _, node := range nodes {
- roots[node.path] = node.node
- }
- for _, node := range nodes {
- getImports(vm, node, roots, errWriter)
- }
-
variablesInFile := make(map[string]common.VariableInfo)
std := common.Variable{
@@ -65,16 +57,20 @@ func lint(vm *jsonnet.VM, nodes []nodeWithLocation, errWriter *ErrorWriter) {
return variables.FindVariables(node.node, variables.Environment{"std": &std, "$std": &std})
}
- for importedPath, rootNode := range roots {
- variablesInFile[importedPath] = *findVariables(nodeWithLocation{rootNode, importedPath})
- }
+ for _, node := range nodes {
+ roots := make(map[string]ast.Node)
+ roots[node.path] = node.node
+ getImports(vm, node, roots, errWriter)
- vars := make(map[string]map[ast.Node]*common.Variable)
- for importedPath, info := range variablesInFile {
- vars[importedPath] = info.VarAt
- }
+ for importedPath, rootNode := range roots {
+ variablesInFile[importedPath] = *findVariables(nodeWithLocation{rootNode, importedPath})
+ }
+
+ vars := make(map[string]map[ast.Node]*common.Variable)
+ for importedPath, info := range variablesInFile {
+ vars[importedPath] = info.VarAt
+ }
- for _, node := range nodes {
variableInfo := findVariables(node)
for _, v := range variableInfo.Variables {
=== RUN TestLinter
--- PASS: TestLinter (0.24s)
=== RUN TestLinter/passing_multiple_input_files
--- PASS: TestLinter/passing_multiple_input_files (0.23s)
PASS
There is something weird going on:
I would expect it to run about as fast as when running each file individually (actually there is room for optimization and avoiding common work).