Closed sobri909 closed 6 years ago
I'm guessing the most efficient solution would be to hand the job over to SQLite, and fetch the datetimes as strftime('%s')
then run them through Date(timeIntervalSince1970:)
. But I couldn't see a way to tidily fit that in. And given that SQLite is seemingly returning datetimes as strings by default (or am I wrong on that?), there might be a good reason why using unix timestamps isn't desired.
Hello @sobri909!
Thanks for this feedback.
given that SQLite is seemingly returning datetimes as strings by default
Not exactly.
SQLite has no concept of dates. It can store int64, doubles, strings, blobs, and nulls. It will interpret certain string and numeric values as dates when told to do so (with the strftime function, for example).
GRDB stores Swift Date as strings, not timestamps, because it was the best default format. But you can still store dates as timestamps. GRDB natively decodes Dates from timestamps with Date(timeIntervalSince1970:)
:
let date = Date() // 13 Apr 2018 at 13:58
let row = try DatabaseQueue().inDatabase { db in
try Row.fetchOne(db, """
SELECT ? AS str, ? AS timestamp
""", arguments: [date, date.timeIntervalSince1970])!
}
row["str"] // "2018-04-13 11:58:32.117"
row["str"] as Date // 13 Apr 2018 at 13:58
row["timestamp"] // 1523620729.02897
row["timestamp"] as Date // 13 Apr 2018 at 13:58
Back to Scanner :-)
We use a Scanner in order to convert date and date components from strings of various formats:
You could surely let SQLite turn those strings into timestamps, and have GRDB decode doubles into date with Date(timeIntervalSince1970:)
, as seen above.
But what about fixing this Scanner slob instead?
SQLite has no concept of dates. It can store int64, doubles, strings, blobs, and nulls.
Yeah I had a sift through the SQLite docs, and it looks like they store DATETIME as REAL? Which is all fine with me. But it seems to be coming back from the database as TEXT by default, so there's that extra conversion cost.
? AS timestamp
But then I can't be lazy and return everything with SELECT *
. Too much work 😂
But what about fixing this Scanner slob instead?
Definitely! I really want to have a go at doing something with vsscanf()
, but have run out of time today. It's the first day of Songkran (Thai new year), so I'm being forced to stop working early. Sigh.
If I get a chance I'll have a play around with it tomorrow though. From the sounds of that blog post, it's even more efficient than using a cached DateFormatter. But the trick will be making it work safely with all the different date string formats.
Yeah I had a sift through the SQLite docs, and it looks like they store DATETIME as REAL?
Precisely speaking, DATETIME
creates an column of numeric affinity.
But it seems to be coming back from the database as TEXT by default.
Yes: numeric affinity does not convert strings into numeric values. Since GRBD stores dates as strings by default (as explained above), SQLite stores strings and fetches strings like "2018-04-13 11:58:32.117".
so there's that extra conversion cost.
When one uses default string storage of dates, yes.
But the trick will be making it work safely with all the different date string formats.
Yeah. Everything happens in DatabaseDateComponents.fromDatabaseValue
.
It's the first day of Songkran (Thai new year), so I'm being forced to stop working early
🕺 Chok dee 🎉 !!
Aah now I understand. I didn't realise SQLite lets you decide what storage type to use for dates. But reading the SQLite docs here, I see what you mean now.
SQLite does not have a storage class set aside for storing dates and/or times. Instead, the built-in Date And Time Functions of SQLite are capable of storing dates and times as TEXT, REAL, or INTEGER values: ... Applications can chose to store dates and times in any of these formats and freely convert between formats using the built-in date and time functions.
I'd assumed that that choice was SQLite's alone. SQLite's internals are different from what I'm used to, so there's some surprises!
Anyway, your reasoning behind choosing TEXT seems perfectly sane to me, so I won't change that default in my project. I'll instead have a go at using the vsscanf()
parser from here and wedging it into DatabaseDateComponents
. If I have any success, I'll do a PR 😄
Lovely!
I've finished up the initial vsscanf()
based parser: SQLiteDateParser. (It's temporarily hooked up to DatabaseDateComponents
like thus). But haven't had a chance to properly quantify its performance yet.
It's definitely solving the memory blowout problem. And appears to be very fast. But I'm curious to see some numbers on exactly how fast it is, in comparison. The memory issue was the one I needed to solve, but knowing whether it's faster/slower, and by how much, is important too! So I'll get some proper numbers together tomorrow.
That's really great, @sobri909. http://jordansmith.io/performant-date-parsing/ makes it sound like vsscanf will bring much improvement 👍
You are already ready for a PR on the development branch. We'll remove useScanfStrategy
, of course. And I'll certainly challenge you on the mutex, and on the allocations that look like we should at least have a look at withUnsafeMutablePointer(to:_:)
.
And I'll certainly challenge you on the mutex
Heh. The mutex is necessary for thread safety, because it's reusing the same static pointers for each call, for memory / performance reasons.
Without the mutex, competing threads will clobber each other and get each other's partial dates. (In testing, before I added the mutex, I was seeing dates coming back with mismatched dates/times from competing threads).
I initially wanted to use os_unfair_lock()
, but that didn't become available until iOS 10, and in retrospect would've been a bad choice anyway, because access to the parser really should favour fairness over performance. And the performance gain of unfair locks would be negligible for this in most cases anyway, I think.
, and on the allocations that look like we should at least have a look at
withUnsafeMutablePointer(to:_:)
.
Interesting! I wasn't familiar with that method, so did some reading of the docs just now. It sounds like it can't be used in this case because of "Do not store or return the pointer for later use."
Although I'm not fully understanding the purpose of the method yet, so I'm probably misinterpreting that sentence.
After cleaning up some other unrelated performance bottlenecks this morning I was able to do some semi reliable performance testing. I did two runs with the vsscanf()
strategy and one with the Scanner
strategy. The results look good! So I'll clean out the temporary code and make a PR.
[totalRowsTimed: 224459, rowTimings.mean: 0.001223]
Memory consumed: 143 MB
[totalRowsTimed: 137277, rowTimings.mean: 0.002699]
Memory consumed: 101 MB
[totalRowsTimed: 29357, rowTimings.mean: 0.005303]
Memory consumed: 627 MB (terminated)
The first vsscanf()
run wasn't fair, because a lot of its queries were simpler, so the mean row fetch/processing times were lower due to less time spent in SQLite.
The second vsscanf()
run and the Scanner
run both performed effectively the same series of queries for the same rows, so are a more fair comparison.
The Scanner
run exhibited the memory consumption problem that got all this started, and was terminated by iOS before it finished.
The Scanner
run had per row timings almost twice as long as vsscanf()
. Although because it couldn't complete its run, the timings aren't a completely fair comparison.
I should add that I did those timing runs in an unreleased version of Arc App, not in a controlled example project. So the timings and memory consumption for both strategies were padded by other things the app was doing.
(The timed queries were all against the LocomotionSample table, which has two DATETIME fields.)
I also suspect the third strategy of using a bunch of cached DateFormatters would perform reasonably well too. When I tried a cached DateFormatter last week it completely solved the memory consumption problem. Although I didn't do any timings to see whether it was more time efficient or not.
But using that strategy would be a pain in the arse, because it would require keeping a separate cached formatter for every supported date/time string format (ie 10 separate DateFormatters). Ugly!
Follow up in #334
Hey 😄 I've finally found some time to get back to database code. Yay!
So anyway, I've found an interesting one while dealing with very large result sets that include
DATETIME
fields.I've got some result sets that can get up into the tens or hundreds of thousands (the one I'm looking at right now is ~40k rows), and I need to walk the entire sequence in a single run, as efficiently as possible. But while walking through the results I've been seeing memory use balloon to over 600MB (at which point iOS kills the app).
So I figured the problem was all those model instances in memory. But turns out it wasn't. The model instances themselves are only taking up a handful of megabytes. The real memory eater is a "leak" in
Scanner
(as used here inDatabaseDateComponents
).Turns out those scanners are pooping out mallocs like crazy, and expecting someone else to clean up after them. Once the sequence has finished iterating, the mallocs do eventually get cleaned up. But if the result set is large enough it won't get that far, and will instead get killed off by iOS. For my 40k rows result set, I'm seeing around 600MB of wasteful mallocs, then iOS terminating the process before iteration completes.
Basically
Scanner
is a slob.I'm currently successfully working around it by using a cached DateFormatter:
But that obviously isn't a portable solution, because of the hardcoded
dateFormat
.While hunting for better solutions I stumbled onto this blog post, which makes use of
vsscanf()
, which looks super fun! But I haven't had a chance to experiment with it yet.