Open swift-ci opened 4 years ago
Comment by Xander Dunn (JIRA)
I ruled out the possibility that the above `Date` extensions are causing this. I replace their usage in my codebase with current `Date()` values and the issue persisted. There are now absolutely now references to locale in my codebase. I still saw the `NSLocale` retains exploding. My only remaining guess is that one of my project's dependencies might be doing something with locales.
Comment by Xander Dunn (JIRA)
Through further investigation on the Swift forums thread, I've figured out exactly where this occurs.
By adding an `abort` in the `swift_retain_hook` when the number of retains is greater than 1M, I get this stack trace:
(lldb) thread backtrace
* thread #​170, name = 'NIO-ELT-0-#13', stop reason = signal SIGABRT
* frame #​0: 0x00007fffad62ff47 libc.so.6`gsignal + 199
frame #​1: 0x00007fffad6318b1 libc.so.6`abort + 321
frame #​2: 0x00005555560b094f ModelApp`swift_retain_hook(object=0x00007ffe80002670) at retain.c:24:11
frame #​3: 0x00007ffff6667ece libFoundation.so`CFRetain + 14
frame #​4: 0x00007ffff669a77d libFoundation.so`__CreateCFDateFormatter + 157
frame #​5: 0x00007ffff636149f libFoundation.so`Foundation.NSDate.description.getter : Swift.String + 159
frame #​6: 0x00007ffff6102ba6 libFoundation.so`Foundation.Date.description.getter : Swift.String + 70
frame #​7: 0x00007ffff61031df libFoundation.so`protocol witness for Swift.CustomStringConvertible.description.getter : Swift.String in conformance Foundation.Date : Swift.CustomStringConvertible in Foundation + 15
frame #​8: 0x00007ffff7960cd7 libswiftCore.so`Swift.DefaultStringInterpolation.appendInterpolation<τ_0_0 where τ_0_0: Swift.CustomStringConvertible>(τ_0_0) -> () + 23
frame #​9: 0x0000555555feb7d7 ModelApp`closure #​1 in MyData.Callback(dataUpdates=3 values, self=0x0000555557a71be0) at MyData.swift:163:63
frame #​10: 0x0000555555ffffc6 ModelApp`closure #​1 in DataStream.handleResponseString(self=0x00007fff0825b850, string="{\"channel\": \"data\", \"type\": \"update\", \"data\": {\"time\": 1601744938.7753208, \"action\": \"update\"}}") at DataStream.swift:152:30
frame #​11: 0x0000555555805e6f ModelApp`thunk for @escaping @callee_guaranteed () -> (@error @owned Error) at <compiler-generated>:0
frame #​12: 0x0000555556000464 ModelApp`thunk for @escaping @callee_guaranteed () -> (@error @owned Error)partial apply at <compiler-generated>:0
frame #​13: 0x0000555555c384d6 ModelApp`closure #​1 in EventLoop.submit<Self>(promise=NIO.EventLoopPromise<()> @ 0x00007ffef17f35a8, task=0x0000555556000450 ModelApp`reabstraction thunk helper from @escaping @callee_guaranteed () -> (@error @owned Swift.Error) to @escaping @callee_guaranteed () -> (@out (), @error @owned Swift.Error)partial apply forwarder with unmangled suffix ".30" at <compiler-generated>) at EventLoop.swift:488:37
frame #​14: 0x0000555555cc6f1c ModelApp`thunk for @escaping @callee_guaranteed () -> () at <compiler-generated>:0
frame #​15: 0x0000555555ccb691 ModelApp`partial apply for thunk for @escaping @callee_guaranteed () -> () at <compiler-generated>:0
frame #​16: 0x0000555555cc6f3c ModelApp`thunk for @escaping @callee_guaranteed () -> (@out ()) at <compiler-generated>:0
frame #​17: 0x0000555555cc6f87 ModelApp`closure #​4 in SelectableEventLoop.run(task=0x0000555555cca070 ModelApp`partial apply forwarder for reabstraction thunk helper from @escaping @callee_guaranteed () -> (@out ()) to @escaping @callee_guaranteed () -> () at <compiler-generated>) at SelectableEventLoop.swift:454:25
frame #​18: 0x0000555555b6d00f ModelApp`thunk for @callee_guaranteed () -> (@error @owned Error) at <compiler-generated>:0
frame #​19: 0x0000555555cca034 ModelApp`thunk for @callee_guaranteed () -> (@error @owned Error)partial apply at <compiler-generated>:0
frame #​20: 0x0000555555cc0a62 ModelApp`withAutoReleasePool<T>(execute=0x0000555555cca020 ModelApp`reabstraction thunk helper from @callee_guaranteed () -> (@error @owned Swift.Error) to @escaping @callee_guaranteed () -> (@out (), @error @owned Swift.Error)partial apply forwarder with unmangled suffix ".9" at <compiler-generated>) at SelectableEventLoop.swift:26:16
frame #​21: 0x0000555555cc52cc ModelApp`SelectableEventLoop.run(self=0x00007ffed4001000) at SelectableEventLoop.swift:453:21
The offending line in my code is MyData.swift:163:
func updatesCallback(type: MyType) -> (([Update]) -> Void) {
return { updates in
let LinkedList = self.updates[type]!
// The below if statement and for loop entire code block are a debug check to see how often the lack of
// FIFO guarantee on async dispatch queues causes old data to be inserted after new data
if LinkedList.count > 0 && updates.count > 0 {
for update in updates {
let topUpdate = linkedList.suffix(1)![0]
let errorText = """
Expected FIFO execution of async dispatch queues, but found historical data being inserted onto \
linked list. Top of linked list has time \(update.time) \
(\(update.timeDouble)), \
while inserted update with time \(topUpdate.time) (\(topUpdate.timeDouble))
"""
if topUpdate.time > update.time {
self.logger.error(Logger.Message(stringLiteral: errorText))
assertionFailure(errorText)
}
}
}
linkedList.append(contentsOf: updates)
}
}
Line 163 is:
linked list. Top of linked list has time \(update.time) \
So this is the scenario causing the retain accretion:
Each time data is received over the network, JSON decode it into Update objects that contain Date properties.
Call a partially evaluated callback function (like `func updatesCallback(type: MyType) -> (([Update]) -> Void)`) with every Update.
In this function, put the Update's Date property into a string: let string = "(update.date)". Then do nothing with the variable string.
The fact that the data is coming over the network and decoded to objects is probably not important.
@swift-ci create
Comment by Xander Dunn (JIRA)
We now have a minimally reproducing code sample: https://github.com/xanderdunn/swift-retain-repro
To reproduce:
$ git clone https://github.com/xanderdunn/swift-retain-repro.git
$ cd swift-retain-repro
$ swift run
The code is set to print every retain on an object that has a retain count above 21,000.
You should see:
...
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94036 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94036 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94037 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94037 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94038 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94037 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94038 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94039 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94040 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94040 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94041 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94040 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94041 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94039 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94040 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94041 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94042 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94043 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94044 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94044 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94045 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94046 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94047 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94047 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94048 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94048 retains!
Fri Oct 9 17:02:57 2020: Foundation.NSLocale at 0x55703e0eff00 has 94049 retains!
It will then intentionally abort() when the retain count reaches 1 million.
The line that causes it is defining let errorText at:
MyData.swift:dataCallback(dataFeeder:):39
Note that it is necessary that my custom data structure BoundedLIFOLinkedList is used to reproduce this bug. When I replaced it with a simple Array, the retain accretion no longer occurred. I'm not using any pointers or unsafe methods in BoundedLIFOLinkedList, so I'm not sure how it is causing it.
Unlike the initial report, this code does not rely on TensorFlow and I was able to reproduce this bug on Ubuntu 18.04 with Swift dev trunk 2020-10-07:
$ which swift
/home/xander/swift-DEVELOPMENT-SNAPSHOT-2020-10-07-a-ubuntu18.04/usr/bin/swift
$ swift --version
Swift version 5.3-dev (LLVM 240312aa7333e90, Swift 15bf0478ad7c47c)
Target: x86_64-unknown-linux-gnu
@millenomi I think this one has been fixed with 5.6.1, correct?
CC @tomerd / @millenomi
Environment
Ubuntu 18.04, Swift dev trunk 2020-10-7.Additional Detail from JIRA
| | | |------------------|-----------------| |Votes | 0 | |Component/s | Compiler | |Labels | Bug | |Assignee | None | |Priority | Medium | md5: 0a5122f6434173e9d7246942a900e0c5Issue Description:
It was requested that I file a bug report from my Swift Forums post here. I have a long-running server using SwiftNIO that ran for 3.5 days and then failed with the following:
After hooking into `_swift_retain`, I was able to identify that the object whose retain count is accreting is NSLocale:
10 minutes after the program started, retains on the object are 1M+:
I have no references to NSLocale in my code. The only occurrences of locale are the following:
I am not sure why there is an NSLocale object that is accruing so many retains. There is no obvious memory leak in my application. Memory usage increased only 1.5 percentage points of 32GB of RAM over 3.5 days running time. This is expected given an increase in the number of server connections and data throughput.