swiftlang / swift

The Swift Programming Language
https://swift.org
Apache License 2.0
67.3k stars 10.34k forks source link

[SR-181] Optimization changes observed behavior of lazy-initialized global variable #42803

Open 05262b81-54a9-4fe1-bf6a-96f8042de10e opened 8 years ago

05262b81-54a9-4fe1-bf6a-96f8042de10e commented 8 years ago
Previous ID SR-181
Radar None
Original Reporter @lilyball
Type Bug

Attachment: Download

Environment Apple Swift version 2.1.1 (swiftlang-700.1.101.15 clang-700.1.81) Swift version 2.2-dev (LLVM 7bae82deaa, Clang 53d04af5ce, Swift 330adc6435)
Additional Detail from JIRA | | | |------------------|-----------------| |Votes | 1 | |Component/s | Compiler | |Labels | Bug, OptimizedOnly | |Assignee | None | |Priority | Medium | md5: 834f85142e1a8e063fe3a532e57ae755

Issue Description:

On the #swift-lang IRC room some code was provided that (accidentally) demonstrated lazy global initialization. But the behavior of the code changes under optimization. The code as written always mutates the currentlySanta global before accessing the currentlyRobot global. Without optimization, currentlyRobot (correctly) is initialized to the value of currentlySanta after mutation. Under optimization, currentlyRobot is initialized prior to the mutation of currentlySanta, even though this changes the behavior.

Note: this code has to be compiled as two separate files or else the globals are evaluated as top-level local properties instead.

// first file
private var grid = [[Int]](count: 4, repeatedValue: [Int](count: 4, repeatedValue: 0))
private var santa = true
private var currentlySanta = (x: 2, y: 2)
private var currentlyRobot = currentlySanta

private enum Direction : String {
    case North = "^"
    case South = "v"
    case East = ">"
    case West = "<"
}

private func givePresent() {
    if santa {
        grid[currentlySanta.x][currentlySanta.y] = 1
    } else {
        grid[currentlyRobot.x][currentlyRobot.y] = 1
    }
}

func solveThirdDay() -> Int {
    givePresent()

    for direction in "^v".characters { //input.characters {
        switch Direction(rawValue: String(direction))! {
        case .North:
            if santa {
                currentlySanta = (currentlySanta.x, currentlySanta.y + 1)
            } else {
                currentlyRobot = (currentlyRobot.x, currentlyRobot.y + 1)
            }
        case .South:
            if santa {
                currentlySanta = (currentlySanta.x, currentlySanta.y - 1)
            } else {
                currentlyRobot = (currentlyRobot.x, currentlyRobot.y - 1)
            }
        case .East:
            if santa {
                currentlySanta = (currentlySanta.x + 1, currentlySanta.y)
            } else {
                currentlyRobot = (currentlyRobot.x + 1, currentlyRobot.y)
            }
        case .West:
            if santa {
                currentlySanta = (currentlySanta.x - 1, currentlySanta.y)
            } else {
                currentlyRobot = (currentlyRobot.x - 1, currentlyRobot.y)
            }
        }

        print(currentlySanta)
        print(currentlyRobot)
        givePresent()
        santa = !santa
    }

    var numPresents = 0

    for x in 0..<grid.count {
        for y in 0..<grid[x].count {
            numPresents += grid[x][y]
        }
    }

    return numPresents
}
// second file
print(solveThirdDay())

Without optimization, this prints

(2, 3)
(2, 3)
(2, 3)
(2, 2)
2

(the first line is the value of currentlySanta after mutation, the second line is the value of currentlyRobot at the same point; lazy initialization means they should be the same)

Under optimization this prints:

(2, 3)
(2, 2)
(2, 3)
(2, 1)
3

Looking at the emitted SIL, without optimization currentlyRobot isn't initialized until it's actually accessed. Under optimization, the function solveThirdDay starts with the following:

define hidden i64 @_TF4main13solveThirdDayFT_Si() #&#8203;0 {
entry:
  %0 = alloca %VVSS17UnicodeScalarView5Index, align 8
  %1 = alloca %VVSS17UnicodeScalarView5Index, align 8
  %2 = load i64* @globalinit_33_006984AFE27EBAC936F04AA4BEC05330_token3, align 8
  %3 = icmp eq i64 %2, -1
  br i1 %3, label %once_done, label %once_not_done

once_not_done:                                    ; preds = %entry
  tail call void @swift_once(i64* @globalinit_33_006984AFE27EBAC936F04AA4BEC05330_token3, i8* bitcast (void ()* @globalinit_33_006984AFE27EBAC936F04AA4BEC05330_func3 to i8*))
  br label %once_done

once_done:                                        ; preds = %entry, %once_not_done
  %4 = load i64* @globalinit_33_006984AFE27EBAC936F04AA4BEC05330_token0, align 8
  %5 = icmp eq i64 %4, -1
  br i1 %5, label %once_done2, label %once_not_done1

(the global being initialized here is currentlyRobot, and it goes on to initialize grid right after but I skipped that part)

Aggressively initializing the variable like this allows it to avoid checking if the value was initialized during the later accesses in the same function, but since it changes observable behavior it doesn't seem like a safe optimization. Granted, I don't think we make any hard guarantees about how lazy global initialization works, except we do explicitly say

Global constants and variables are always computed lazily, in a similar manner to Lazy Stored Properties.

And absent any caveats, this suggests that users can expect it to be predictably lazy.

05262b81-54a9-4fe1-bf6a-96f8042de10e commented 8 years ago

Ironically, the author of the code actually didn't want lazy globals. He thought the unoptimized behavior was a bug (I think he forgot globals were lazy, or maybe never knew). I mention this just because it's amusing, and not because it's an argument in favor of aggressive initialization (our globals really need to have deterministic behavior to be sane, and since we're not going to aggressively initialize them at program startup, we need to preserve the lazy semantics).

jckarter commented 8 years ago

We don't guarantee that initializations happen in any particular order, only that the initialization will happen at some point before you load from the global property. The behavior here could be changed not only by the optimizer, but by any added code that mutates currentlySanta before the initialization of currentlyRobot, so saying "globals should always be lazy" doesn't give you any guarantee against the behavior change here. The correct thing to do is to initialize both variables from a common immutable constant:

let initialState = (x: 2, y: 2)
var currentlySanta = initialState
var currentlyRobot = initialState
05262b81-54a9-4fe1-bf6a-96f8042de10e commented 8 years ago

@jckarter I agree that's the correct solution for this code. My point here is just that we currently document globals (and statics) as being lazy-initialized in a manner similar to lazy stored properties, which are themselves documented as being calculated the first time they're accessed. Nowhere do we say that a global/static stored property may be initialized prior to the first access, and it's surprising to have the optimizer change the behavior here.

Besides, if we really do want to say the optimizer is allowed to initialize global stored properties prior to first access, then we may as well go all the way and statically initialize currentlyRobot to (x: 2, y: 2) like we do for currentlySanta under optimization. But if we do that, then we really need to change the documentation to state that global/static stored properties may be initialized prior to the first access. I just worry that in doing so, we remove the ability to rely on the lazy behavior, and if we do that, then we really should add support for the lazy keyword on global/static stored properties (in fact, trying to use lazy on a static stored property right now emits an error saying the property is already lazy). And of course people will still write code that depends on lazy-initialization without realizing it (because debug will still be predictably lazy), so even that doesn't fix anything, it just makes excuses for why the optimizer is free to change program behavior.

jckarter commented 8 years ago

You're right, we should clarify the documentation (and we should optimize currentlyRobot to a static initialization well). "Depending on lazy initialization" is impossible in general, since there's almost always code that could have run earlier than you and forced initializations in a different order. The optimizer is revealing problems that could have also been caused by additional normal code that touched the globals in a different order.

05262b81-54a9-4fe1-bf6a-96f8042de10e commented 8 years ago

My worry here is over code like the following:

struct FooManager {
    /// Path to the FooManager's data on disk.
    /// - Important: This must be set before accessing `sharedInstance`.
    static var path: String!
    /// A shared `FooManager` initialized with `FooManager.path`.
    static let sharedInstance = FooManager(path: FooManager.path)
    // ...
}

Basically, if initializing something depends on something else to have been configured first. Maybe not the best idea, but it's something that seems like it should work. It's conceptually no different than using a static function to return the shared instance, except it's slightly nicer to use.

I'd be happy if we changed the documentation to explicitly state that you cannot rely on global/static stored properties not being initialized before the first access, and added support for marking them lazy to provide that guarantee when it's useful. I think that still leaves the potential for writing bugs, but I assume that being able to optimize away a lot of those globalinit checks is actually worth it.

jckarter commented 8 years ago

That's an interesting case to consider. The semantics are geared toward the assumption that there's a dependency order evident in the execution of global initializations. Examples like yours where the lazy initialization is dependent on side effects outside of other global initializers are definitely problematic.

Dante-Broggi commented 6 years ago

Has this been examined recently?

swift-ci commented 4 years ago

Comment by Liu Liu (JIRA)

Hey, @jckarter, we recently observe the issue mentioned in @lilyball for this library: https://github.com/segmentio/analytics-ios/blob/master/Analytics/Classes/SEGAnalytics.m#L36 where their initialization depends on a particular order. When someone uses static let to capture `SEGAnalytics.shared()`, it could be reordered by compiler.

Personally I think this behavior is surprising, but OK. Do we want to change any documentation to reflect:

We don't guarantee that initializations happen in any particular order, only that the initialization will happen at some point before you load from the global property.

Thanks!

Edit: I attached a minimal reproducible code with SPM. swift build -c release will print nil while swift build -c debug will print Optional(Singleton)

jckarter commented 4 years ago

liuliu (JIRA User), @eeckstein had recently modified global optimization to be more conservative with regards to global optimization ordering. Do you see the issue in Xcode 12?

jckarter commented 4 years ago

liuliu (JIRA User) Your example may be evidence of a different problem. I would recommend filing a new bug. In your main code:

class SingletonManager {
  static let shared = SingletonManager()
  private var singleton: Singleton?
  private init() {
    singleton = Singleton.sharedInstance()
  }
  func printSomething(_ input: Int) {
    print("val \(input), \(singleton as Optional)")
  }
}

func setupSomething(val: String) {
  Singleton.setupTheSingleton()
  switch (val) {
  case "arg1":
    SingletonManager.shared.printSomething(10)
  default:
    SingletonManager.shared.printSomething(100)
  }
}

setupSomething(val: CommandLine.arguments[1])

Since `SingletonManager.shared` is accessed on all code paths, and is used by the call to `printSomething`, its initialization ought to be triggered in any case.

swift-ci commented 4 years ago

Comment by Liu Liu (JIRA)

@jckarter checked the example on Xcode 12 Beta 4. It seems no longer an issue. Should I still file a ticket or wait @eeckstein to confirm this is a problem solved?

eeckstein commented 4 years ago

This should be fixed. No need to file another ticket.