realm / SwiftLint

A tool to enforce Swift style and conventions.
https://realm.github.io/SwiftLint
MIT License
18.68k stars 2.22k forks source link

Investigate cases in which cache is stale #1184

Closed ghost closed 7 years ago

ghost commented 7 years ago

We have just updated to 0.16 in our codebase and I'm working through the new rules. First up it's sorted_imports. For this, we have a build phase which runs ${PODS_ROOT}/SwiftLint/swiftlint as per the guidelines in the README file. I've worked through part of our codebase (it takes a while as we have well over a hundred thousand lines of Swift code) and sorted the imports. I then re-ran the linter, to clear out the issues I'd already fixed and make it easier to keep track of my progress.

At this point, the linter reported the remaining issues, but also reported several files I'd already fixed as having the issue. Manual inspection confirmed that the imports had already been sorted correctly.

I ignored the issue for the time being and continued through the real issues, eventually reaching the end. Rerunning left only the false positives. No matter how many times I re-ran the linter, it would keep these false positives.

However, changing our script in the build phase to ${PODS_ROOT}/SwiftLint/swiftlint line --no-cache resulted in the linter finding lots of files I hadn't got to, and removed all the false positives.

I have no idea what's going on behind the scenes, but the new cache added in 0.16 seems to be broken.

jpsim commented 7 years ago

It's quite possible that the approach used to determine if a file has changed is insufficient. We're currently hashing the file, so maybe the hash isn't affected by a re-ordering of lines?

marcelofabri commented 7 years ago

It seems that the hash changes when re-ordering lines:

  1> let str1 = "import Foo\nimport Bar" 
str1: String = "import Foo\nimport Bar"
  2> let str2 = "import Bar\nimport Foo" 
str2: String = "import Bar\nimport Foo"
  3> str1.hashValue
$R0: Int = -300990755179970577
  4> str2.hashValue
$R1: Int = 7298368255116804847
ghost commented 7 years ago

Any further progress on this one? We are waiting on 0.17 for a few fixes, and while we could go ahead with it currently, we would need to always disable the cache due to these problems. Obviously that's undesirable for everyone.

marcelofabri commented 7 years ago

Not that I'm aware of, but feel free to investigate it deeper! A repro case would be very welcome šŸ˜‰

jpsim commented 7 years ago

It seems that the hash changes when re-ordering lines

Sorry, it was just a guess. I'll update the issue title to not be so misleading šŸ˜‰

norio-nomura commented 7 years ago

It is a mistake to use hash to detect changes inString, because hash does not use the entire contents ofString.

An example of hash collision:

import Foundation

let first32Char = repeatElement("f", count: 32)
let middle32Char = repeatElement("m", count: 32)
let last32Char = repeatElement("l", count: 32)
let a = (first32Char + ["ab"] + middle32Char + ["ab"] + last32Char).joined()
let b = (first32Char + ["bc"] + middle32Char + ["bc"] + last32Char).joined()
a == b // false
// Foundation
a.hash // -4528083392938427260
b.hash // -4528083392938427260
// Swift.Hashable
a.hashValue // -8957072229412966235
b.hashValue // -8957072229412966235
jpsim commented 7 years ago

Maybe we should consider the last modified time of the file instead

marcelofabri commented 7 years ago

What about using MD5 as hash?

jpsim commented 7 years ago

I think MD5 or SipHash would lead to a more reliable result, but I still think last modified time might be most reliable. It's what tools like rsync use to determine whether or not a file has changed.

phoney commented 7 years ago

I ran into this with the 0.16.0 version on the type_name rule. I had a type configurableObject that was lowerCamelCase. I fixed it in a number of files to be upperCamedlCase but those errors continued to be reported.

I believe that most tools use the modification time for this. Make sure to do a != comparison rather than a > comparison between the saved mod time and the one on the file. There are oddball cases where the clock can be reset or a file can come from another computer and the dates are in different time zones where the != comparison works but the > comparison doesn't.

Is there a way to delete the cache to work around this issue?

marcelofabri commented 7 years ago

@phoney You can run swiftlint lint --no-cache

benasher44 commented 7 years ago

While working on #1191, I found that the current caching behavior might not be reliable across runs of swiftlint when there are multiple paths present. Currently, the path to the cache is determined based on one of the following, if the cache_path config option is not provided:

  1. The full path to the folder swiftlint is operating on (or the current working directory)
  2. The full path to the path passed to --path

I'm not sure what happens (in terms of automatic caching) when swiftlint is used from Xcode (i.e. multiple paths are passed to lint via environment variables). Maybe someone can help me out here?

In any case, the fallback logic used to determine a default cache path (based on the above numbered options I mentioned) falls apart a bit when allowing people to pass multiple paths to swiftlint as arguments. You could hash all of the paths passed to swiftlint combined, but if you omit a single path, then you'll bust the cache using that strategy (assuming you're meaning to lint the same project minus one path). Given this, my current thought is to remove automatic caching, and I have a few thoughts about why:

  1. I expect a CLI command like swiftlint lint to behave the same across multiple invocations. This isn't the case right now, since it has built-in (i.e. enabled by default / opt-out) automatic (determine cache key automatically) caching (after the first invocation, it leans on the cache).
  2. As mentioned earlier, the automatic caching logic seems to fall apart when passing multiple paths. At least right now, I can't think of any great ways to make this better that easily re-uses the same cache for the same project across multiple runs when passing multiple paths (assuming you might not pass the same paths each time but want to re-use the cache for your project).
  3. Caching is easy to enable in the configuration, and this feature is probably most useful for longer term projects where you'll likely be leaning heavily on a config file anyway.
  4. Because of 3, most CLI usages of swiftlint that don't use a config file probably don't benefit from caching, so it seems a bit wasteful to leave on by default.

I'd appreciate any comments filling in knowledge gaps here. Thanks!

benasher44 commented 7 years ago

Made a PR for this (#1267) to show what this entails and to push this discussion forward a bit :)

victorpimentel commented 7 years ago

I have tackled this by trying to fixing the cache, not by disabling it šŸ˜„ The change is a bit more intrusive that I had hoped, but in my tests it works well and the performance is really good.

Can someone test it in their own projects (backuping up first šŸ¤“ )? Thanks!

benasher44 commented 7 years ago

To clarify, I was told I could use this issue as an umbrella for general caching issues (i.e. disabling automatic caching won't solve the original issue in the description/title). I don't mean to suggest that caching should be disabled all together. I just think you should have to opt into it.

marcelofabri commented 7 years ago

I have mixed feelings about turning it opt-in. Maybe we can do that until we're sure that we don't have any more issues with it.

Just as a comparison, rubocop enables caching by default.

Another thing to have in mind (but I don't know if it's a real issue) is that when building from master, you get the same version number as the latest stable version, so cache isn't invalidated.

benasher44 commented 7 years ago

Hm that's a good point. Maybe the automatic caching should only be project-oriented? What I mean is that maybe it only makes sense if you're linting a whole folder or have provided a cache path? This basically goes back to disabling caching when specifying multiple specific files via CLI arguments, but it would be enabled if you pass a single folder (or nothing to lint the current directory).

marcelofabri commented 7 years ago

Yeah, I think that's a good idea.

victorpimentel commented 7 years ago

I'd leave this as a small learning warning for future references. Never use Foundation hashes for equality:

let cats: NSArray = ["Garfield", "Grumpy Cat", "Doraemon"]
let dogs: NSArray = ["Scooby-Doo", "Snoopy", "Pluto"]

cats.hash // 3
dogs.hash // 3

http://swiftlang.ng.bluemix.net/#/repl/58985cbae69a503cdf5221b9

Greensource commented 7 years ago

The cache file seem to be located into ~/Library/Application Support/SwiftLint. Isn't better if it will be located into derived data for example? Because today, when you clean or remove derivedData you think you start from scratch but it's not true for Swiftlint.

jpsim commented 7 years ago

In retrospect, Application Support is definitely the wrong place to be storing the cache.

In my work-in-progress branch to fix fundamental issues with our caching strategy (jp-modified-time), I've moved the cache location to the caches directory (ee9ce4a). In whatever shape the caching fixes end up in, I hope that's a change we can sneak in.

victorpimentel commented 7 years ago

Ok, I did another pull request with the proposed fix, I hope we can merge that back soon šŸ˜„

Mercieral commented 7 years ago

On a somewhat related note while we wait for the PR, how do I specify the linter to ignore cache with "--no-cache" in the xCode build phase? It gives me a Shell Script Invovation Error line 2: /Users/Aaron/project/Pods/SwiftLint/swiftlint --no-cache: No such file or directory when trying to use a build phase script of "${SRCROOT}/Pods/SwiftLint/swiftlint --no-cache"

marcelofabri commented 7 years ago

@Mercieral You need to use the lint command explicitly to use any options: swiftlint lint --no-cache

Mercieral commented 7 years ago

@marcelofabri Sorry I should have mentioned that I tried both. So using "${SRCROOT}/Pods/SwiftLint/swiftlint lint --no-cache" gives me /Users/Aaron/project/Pods/SwiftLint/swiftlint lint --no-cache: No such file or directory

victorpimentel commented 7 years ago

@Mercieral It seems that you don't have the swiftlint pod installed. You need to follow these instructions to integrate it with your projects:

https://github.com/realm/SwiftLint#using-cocoapods

Maybe you just need to do a pod install...

Mercieral commented 7 years ago

@victorpimentel the swiftlint pod is installed using the newest version of cocoapods (just checked yesterday). The problem is that I can run "${SRCROOT}/Pods/SwiftLint/swiftlint" in the xcode build phase script just fine and it works, and I can run "Pods/SwiftLint/swiftlint lint --no-cache" just fine from the command line, but running "${SRCROOT}/Pods/SwiftLint/swiftlint lint --no-cache" in the build script causes xcode not to be happy. Am I the only one this happens to?

marcelofabri commented 7 years ago

@Mercieral can you post your whole script here?

Mercieral commented 7 years ago

I found the issue, simple fact that I thought that build scripts needed to be in quotes... I can remove my comments in this issue so it doesn't pollute the discussion of the actual issue if you want.

ZevEisenberg commented 7 years ago

Off-topic, but I would still use quotes like this to guard against paths with spaces in them:

"${SRCROOT}/Pods/SwiftLint/swiftlint" lint --no-cache
marcelofabri commented 7 years ago

Closed by #1530

acecilia commented 2 years ago

Hi šŸ‘‹ I have some questions about cache that cant find answers for. Found this discussion, maybe you guys can help me out šŸ™ :

jpsim commented 2 years ago
  • Is there any way to remove the swiftlint cache that does not involve manually deleting the corresponding folder? Something like swiftlint cache delete?

No there's no way to clear the cache via SwiftLint. As far as I know, SwiftLint's caching mechanism has been robust for several years now, so there's no compelling reason to expose a way to clear the cache for end users.

However, disabling the cache is extremely useful when working on SwiftLint itself, since SwiftLint's version number is used to assume that if the version number doesn't change, SwiftLint's behavior hasn't changed, which is often not the case when making changes to SwiftLint itself.