Closed jish closed 7 years ago
@jish I have experimented with the benchmark and no matter of the project size the difference was almost nonexistent
Other problem might be that on my current work project it takes ~31.5s to calculate the list, this is way to long, with:
BINARIES = [".img", ".iso", ".dmg", ".xz", ".tar", ".rar", ".zip", ".gz"].freeze
IMAGES = [".jpg", ".jpeg", ".png", ".gif", ".ico", ".pdf"].freeze
IGNORED_EXTENSIONS = BINARIES + IMAGES
SOURCE_FILES = [".md", ".rb", ".js", ".html", ".css", ".json", ".yml", ".erb", ".coffee", ".slim", ".liquid", ".sass", ".scss", ".feature"].freeze
it dropped to ~6.35s, still to long, but a lot better
I guess we could add a lot more to the ignored list and use plugins as a information which extensions need to be included in source_files (and dropped from ignored list)
"This looks good to me! One simplification I would consider: get rid of all the
source_file?
andbinary?
logic in favor of just having a list of file extensions you don't check."
In concept, I totally agree with you. Ideally, we would never have to run the binary test. Unfortunately, in the wild there are files that simply do not have extensions, and it is a much better user experience to approximate the correct behavior rather than accidentally blocking commits or throwing exceptions.
I am all for extending the list of known file extensions giving more consistent behavior and performance. But I think we should keep the binary test in place as a fallback in case new file extensions emerge that we had not anticipated.
cc: @jfly
We have a mechanism to determine if we should run our list of checks on the files being committed. Generally these checks are pretty simplistic. For example, does the file exist, or is it a directory?
Also, in order to avoid running checks on large binary files we run a heuristic against every file currently being committed. Unfortunately, this binary check is prone to errors and is now causing performance issues.
In order to avoid this binary file test, we do some quick filtering based on file extension. If the file is an image, we definitely do not want to run checks on it. If the file is a Ruby source code file, we definitely do want to run checks on it.
This will allow us to avoid running the binary test in most cases.
Fixes #255