Open srawlins opened 9 years ago
I've thought about this as well, and would be open to accepting a pull request. scss-lint
should be relatively straightforward to parallelize.
I would like the flag name to be called --concurrency/-C
and accept a numeric value or auto
to indicate scss-lint
should just pick a number that "makes sense" based on the number of cores available. I haven't done any research on this, but if you find that on average half the number of cores is optimal, let's go with that.
My other comment is that you can go further than splitting the linters into groups. If you can create a queue of all linter/file pairs that should be run, you can spawn concurreny
number of workers and have them all pull from the same queue, so each "job" is passed an instance of a linter and a file to lint. I'm willing to bet the cost of instantiating a linter per job would still be faster than trying to reuse the same linter instance across multiple jobs. In theory this should allow you to maximize parallelism.
Thanks for your thoughts. Look forward to seeing what you come up with!
I had a quick poke at trying to parallelize using the parallel
gem, which forks into separate processes under the hood, getting past MRI's GIL which limits parallelism via threads (it also saves us from having to make any of the code thread safe since workers are isolated in separate processes).
The patch is pretty straightforward:
diff --git a/lib/scss_lint/runner.rb b/lib/scss_lint/runner.rb
index 16f4b5a..5cde35f 100644
--- a/lib/scss_lint/runner.rb
+++ b/lib/scss_lint/runner.rb
@@ -1,3 +1,5 @@
+require 'parallel'
+
module SCSSLint
# Finds and aggregates all lints found by running the registered linters
# against a set of SCSS files.
@@ -15,9 +17,9 @@ module SCSSLint
# @param files [Array<Hash>] list of file object/path hashes
def run(files)
@files = files
- @files.each do |file|
+ @lints = ::Parallel.map(@files) do |file|
find_lints(file)
- end
+ end.flatten
end
private
@@ -26,29 +28,35 @@ module SCSSLint
# @option file [String] File object
# @option path [String] path to File (determines which Linter config to apply)
def find_lints(file)
- engine = Engine.new(file)
-
- @linters.each do |linter|
- begin
- run_linter(linter, engine, file[:path])
- rescue => error
- raise SCSSLint::Exceptions::LinterError,
- "#{linter.class} raised unexpected error linting file #{file[:path]}: " \
- "'#{error.message}'",
- error.backtrace
+ accumulated_lints = []
+
+ begin
+ engine = Engine.new(file)
+
+ @linters.each do |linter|
+ begin
+ run_linter(linter, engine, file[:path], accumulated_lints)
+ rescue => error
+ raise SCSSLint::Exceptions::LinterError,
+ "#{linter.class} raised unexpected error linting file #{file[:path]}: " \
+ "'#{error.message}'",
+ error.backtrace
+ end
end
+ rescue Sass::SyntaxError => ex
+ accumulated_lints << Lint.new(nil, ex.sass_filename, Location.new(ex.sass_line),
+ "Syntax Error: #{ex}", :error)
+ rescue FileEncodingError => ex
+ accumulated_lints << Lint.new(nil, file[:path], Location.new, ex.to_s, :error)
end
- rescue Sass::SyntaxError => ex
- @lints << Lint.new(nil, ex.sass_filename, Location.new(ex.sass_line),
- "Syntax Error: #{ex}", :error)
- rescue FileEncodingError => ex
- @lints << Lint.new(nil, file[:path], Location.new, ex.to_s, :error)
+
+ accumulated_lints
end
# For stubbing in tests.
- def run_linter(linter, engine, file_path)
+ def run_linter(linter, engine, file_path, accumulated_lints)
return if @config.excluded_file_for_linter?(file_path, linter)
- @lints += linter.run(engine, @config.linter_options(linter))
+ accumulated_lints.concat(linter.run(engine, @config.linter_options(linter)))
end
end
end
diff --git a/scss_lint.gemspec b/scss_lint.gemspec
index aa83e49..634136a 100644
--- a/scss_lint.gemspec
+++ b/scss_lint.gemspec
@@ -25,6 +25,7 @@ Gem::Specification.new do |s|
s.required_ruby_version = '>= 1.9.3'
+ s.add_dependency 'parallel', '~> 1.6'
s.add_dependency 'rainbow', '~> 2.0'
s.add_dependency 'sass', '~> 3.4.15'
This worked surprisingly well for one of our large repositories: execution time dropped from 5 seconds to 2 seconds.
However, running this against another one of our large legacy repositories results in a strange marshal data too short
error. I think it has something to do with the number of lints returned by the find_lints
method, since if I change that to return an empty array no error occurs. This means we might need to serialize the lints to a file and then deserialize them on the master process in order to work around the issue.
I'm looking into ways to make scss-lint faster. If I were being rigorous this would include some more data [1].
One idea I have that is playing well with one sample file is to multithread
SCSSLint::Runner#find_lints
. I've measured a speedup when splitting the list of@linters
into 2, 3, or 4 lists, and giving each section to a different thread. 4 was the best speedup, cutting time by about 25%. 6 and 8 threads were no better (probably because, with ~50 lints, sublists of 12 linters and 6 linters are not too different).Would the project be open to a PR with a
--workers
option (I'm open to other names)? A second idea is to partition the SCSS files into smaller groups, and multithread on those, since even MRI can profit from each Thread waiting on the file reads. I can open up a second issue for that idea.[1] Properly, I'd like to look at some large, real-world SCSS files (maybe Bootstrap 4 or some big public Rails apps...) and:
But I don't have plans to do this yet.