groue / GRDB.swift

A toolkit for SQLite databases, with a focus on application development
MIT License
6.7k stars 687 forks source link

Core Data coming out faster than GRDB.swift? #981

Closed christianselig closed 3 years ago

christianselig commented 3 years ago

Environment

GRDB flavor(s): GRDB vanilla GRDB version: 5.8.0 Installation method: (CocoaPods, SPM, manual?) SPM Xcode version: 12.5 Swift version: 5.4 Platform(s) running GRDB: (iOS, macOS, watchOS?) macOS M1 compiling, iPhone 6S Plus on iOS 13.5.1 test device macOS version running Xcode: 11.3.1

Introduction

Hi! I just wanted to check if this seems reasonable enough. Long story short, I wanted to compare between Core Data and GRDB for fetch performance. I assumed GRDB would be faster, but that's not what I'm getting so far. If I insert ~5,000 items into both, and then try to do a "prefix" style search in each (LIKE in GRDB, BEGINSWITH in Core Data), and limit it to 7 items, Core Data takes approximately 0.0046 seconds on an iPhone 6S Plus, whereas GRDB takes about 0.0123 seconds in Debug mode, or 0.0089 seconds in Release mode. (Core Data still being about 2x as fast)

Is this normal? It's not to say GRDB is slow at all, just curious.

Code is as follows:

Data Set Used

Approximately 5,000 Reddit subreddits

https://gist.github.com/christianselig/e3e1cf5fd808ad52d408a6270b581525

Core Data

Model called "Subreddit" with "name" (String) and subscribers (Int64). Inserted as follows:

func insertAll() {
    let items = itemsFromJSON()

    for item in items {
        let subreddit = Subreddit(context: container.viewContext)
        subreddit.name = item
        subreddit.subscribers = 3466293
    }

    saveContext()
}

For the search I do:

func doRequest() {
    let startTime = CFAbsoluteTimeGetCurrent()

    let request = Subreddit.createFetchRequest()
    request.predicate = NSPredicate(format: "name BEGINSWITH 'ask'")
    let sort = NSSortDescriptor(key: "subscribers", ascending: true)
    request.sortDescriptors = [sort]
    request.fetchLimit = 7

    do {
        let subreddits = try container.viewContext.fetch(request)
        let timeElapsed = CFAbsoluteTimeGetCurrent() - startTime
        print("Time elapsed: \(timeElapsed) s.")
        print(subreddits.count) // 7
    } catch {
        print(error)
    }
}

GRDB

Followed a combination of this tutorial and the demo UIKit project in this library.

Persistence.swift

import UIKit
import GRDB

var dbQueue: DatabaseQueue!

class DatabaseManager {
    static var migrator: DatabaseMigrator {
        var migrator = DatabaseMigrator()

        #if DEBUG
        migrator.eraseDatabaseOnSchemaChange = true
        #endif

        migrator.registerMigration("createSubreddit") { db in
            try db.create(table: "subreddit") { t in
                t.column("id", .text).primaryKey()
                t.column("subscribers", .integer).notNull()
            }
        }

        return migrator
    }

    static func setup(for application: UIApplication) throws {
        let fileManager = FileManager()
        let folderURL = try fileManager
            .url(for: .applicationSupportDirectory, in: .userDomainMask, appropriateFor: nil, create: true)
            .appendingPathComponent("database", isDirectory: true)
        try fileManager.createDirectory(at: folderURL, withIntermediateDirectories: true)

        let dbURL = folderURL.appendingPathComponent("db.sqlite")

        dbQueue = try DatabaseQueue(path: dbURL.path)
        try migrator.migrate(dbQueue)
    }
}

struct Subreddit: Identifiable, Hashable {
    var id: String
    var subscribers: Int64
}

extension Subreddit: Codable, FetchableRecord, MutablePersistableRecord {
    // Define database columns from CodingKeys
    fileprivate enum Columns {
        static let id = Column(CodingKeys.id)
        static let subscribers = Column(CodingKeys.subscribers)
    }

    static func askMatches() -> QueryInterfaceRequest<Subreddit> {
        return Subreddit.filter(Columns.id.like("ask%"))
    }
}

Inserted as:

let items = self.itemsFromJSON()

try! dbQueue.inTransaction(.deferred, { db in
    for item in items {
        var subreddit = Subreddit(id: item, subscribers: 3466293)
        try! subreddit.insert(db)
    }

    return .commit
})

Fetch is as follows:

let startTime = CFAbsoluteTimeGetCurrent()

try! dbQueue.read({ db in
    let matches = try! Subreddit.askMatches().limit(7).fetchAll(db)
    let timeElapsed = CFAbsoluteTimeGetCurrent() - startTime
    print("Time elapsed: \(timeElapsed) s.")

    print(matches.count)
})

Result

Results are above, where Core Data comes out faster. GRDB responds remarkably quickly, and I like it a lot, I'm just admittedly quite new to SQLite and database stuff in general, and I figured where GRDB sat a little closer to the metal it would be faster, so I thought I'd inquire in the likely case that in my newbiness I was doing something silly!

christianselig commented 3 years ago

On the flip side, it's much faster with larger data sets! https://twitter.com/ChristianSelig/status/1399421904604237824

Jasperav commented 3 years ago

I guess it wont fix the problem, but nice to know: measuring time is hard: https://stackoverflow.com/a/65860947/7715250, I would recommend putting the work in the measure block.

I tried to run your demo project, but I noticed the GRDB files are missing that you added here. If you can add them to your test project and put everything in measure blocks, it will be much quicker to reproduce (maybe the issue has gone away when putting stuff in the measure block)

christianselig commented 3 years ago

@Jasperav Do you mean my demo Core Data project from my tweet?

Here's a version showing GRDB: https://github.com/christianselig/GRDBTest

Re: measure, that's a fair point. But I think running them back to back multiple times and having one always be a clear winner for my code should illustrate my point well enough.

groue commented 3 years ago

Hello folks,

I measure GRDB performance regularly, in a few scenarios: https://github.com/groue/GRDB.swift/wiki/Performance.

The key is to perform measurements in the release configuration.

groue commented 3 years ago

Also, see https://github.com/groue/GRDB.swift/issues/926 for more understanding of where GRDB spends time, and how to avoid it.

As you said, @christianselig, getting closer to the SQLite metal helps performance. GRDB tries to make it possible to go very close to the metal, while not ruining everything when one uses high-level apis. But it's important to understand what's happening under the hood.

Most time is spent performing I/O, copying database values, and decoding. Make sure you use transactions in order to let SQLite optimise disk access. The less you have GRDB copy values, and the less you have GRDB look up for strings, the faster your code will run. See also the "optimized for fetching performance" example at https://github.com/groue/GRDB.swift/blob/master/README.md#examples-of-record-definitions

christianselig commented 3 years ago

Thanks so much for the response @groue!

So I'm taking the measurements in the Release config, and I changed my model to use row access rather than subclass Codable:

struct Subreddit: Encodable {
    var id: String
    var subscribers: Int64
}

extension Subreddit: TableRecord {
    enum Columns {
        static let id = Column(CodingKeys.id)
        static let subscribers = Column(CodingKeys.subscribers)
    }

    static let databaseSelection: [SQLSelectable] = [
        Columns.id,
        Columns.subscribers]
}

extension Subreddit: FetchableRecord, MutablePersistableRecord {
    init(row: Row) {
        id = row[0]
        subscribers = row[1]
    }

    static func askMatches() -> QueryInterfaceRequest<Subreddit> {
        return Subreddit.filter(Columns.id.like("ask%"))
    }
}

However I'm still not seeing much of an appreciable difference in speed for a single fetch, which is what I'm focusing on in this case. Core Data is still winning. And where it's for a single fetch of 7 items, I'm not sure transactions would help here, would they? (I'm using it for the thousands of inserts at setup, but not focused on performance there as I'll probably just use a prepopulated sqlite file)

Is this expected behavior? Again, totally understandable if it is, this is still very fast, I just was surprised Core Data came out a little faster for this fetching case.

groue commented 3 years ago

I can't really tell. You may run Instruments, so that we get an idea of what's GRDB and SQLite are doing. You may also get a good CoreData score because it happens it uses its RAM cache and does not have to hit the file system.

I can't guess, I can't tell.

On top of that:

0.0046 seconds ... 0.0123

Those figures are interesting when you compare them. However, when you only consider their absolute value, they're quite small, and I'm questioning the relevance of the investigation. Your app is only fetching a few lines, and I don't see any database crunching where performance would really matter. What do you think?

christianselig commented 3 years ago

That's fair! The second figure (12 ms) is hovering right around the 16 ms time figure for how long you have to render a frame (for instance while scrolling) before there's a visual hitch to the user. 4 ms is fast enough that even while scrolling I could likely pull the figure out of the database on the main thread (say, figuring out if an attribute of a subreddit and initializing a label if need be) without much fear of dropping frames. I was secretly hoping GRDB might improve that Core Data figure by maybe another 2-3x, bringing it down to perhaps a single millisecond!

To put it another way, it would be approaching similar speeds to an in-memory store (in practical use) and would alleviate a lot of cognitive load with storing it in a database versus, say, UserDefaults.

groue commented 3 years ago

I was secretly hoping GRDB might improve that Core Data figure by maybe another 2-3x, bringing it down to perhaps a single millisecond!

I have been concentrated on large requests, not on small ones, so there surely exist opportunities for enhancement. This requires instrumentation in order to spot the potential improvements.

Besides, GRDB currently does not use any cache of any form (except the one built in SQLite). So this may give Core Data an advantage in some situations. No reflection on what a cache could be for GRDB, its impact, and scope, has been performed yet.

To put it another way, it would be approaching similar speeds to an in-memory store (in practical use) and would alleviate a lot of cognitive load with storing it in a database versus, say, UserDefaults.

A disk storage as fast as RAM sounds quite magical. But, if you aim at the speed on an in-memory store, did you try using one? You can copy a database on disk into memory, and see what it gives:

let diskDB = try DatabaseQueue(path: "...")
let memoryDB = DatabaseQueue()
try diskDB.backup(to: memoryDB)
try memoryDB.read { db in ... }
christianselig commented 3 years ago

Unfortunately this database is big enough that I'd rather not keep it in memory, was one of the reasons I'm finally putting on my adult pants and graduating from plists to proper databases 😛

groue commented 3 years ago

All right. My intuition is that the performance is dominated by SQLite, since you removed most of Codable and GRDB layers by using numerical row indexes. But only Instruments can tell if this intuition is correct (and I can not do that for you) You may thus look for indexes (I think I've read somewhere that SQLite can index prefix LIKE queries).

I don't quite know how I can help further without much more actionable input.

christianselig commented 3 years ago

I can definitely look into that! My expertise definitely isn't super high in SQL land but I'm happy to learn when I get a chance.

I'm happy to provide more in the way of sample projects if that would help you, otherwise I'm also happy to close this until I get a chance to look at it further myself!

groue commented 3 years ago

I would not like that you spend time without much goal :-)

In your situation, I would profile my app, have Instruments strictly focus on the part of the app that frequently accesses the database, and look who (app / GRDB / SQLite / Swift runtime) spends time doing what.

Everything in profiling can be be questioned, based on actual profiling data. Do you need a better SQLite tuning? Is GRDB doing something useless? Is the app asking GRDB to do something useless? Is there bad memory management which triggers a lot of ARC overhead?

If you're not sure you can do that, well it's a good opportunity to learn. Sweating the last bits of performance is not easy, I know.

Running very short requests twice as slow as Core Data, considering it can "cheat" with its ram cache, does not sound alarming to me.

If you need a ram cache, you may want to implement one, given this topic has not been explored in GRDB. There is surely nothing that prevents you from doing it. I'm not against adding a general-purpose cache in GRDB, but it happens that I never needed it yet, and nobody has provided the smell of the beginning of an analysis of the topic. Solving particular problems is always easier than solving general ones: your experience enhancing your app is precious, and you'll always be welcome sharing what you have learned.

But you'll only know if a RAM cache may help after you've performed instrumentation of your app.

christianselig commented 3 years ago

Perfect, you've given me much to think about! Thank you greatly!

groue commented 3 years ago

You're welcome, happy GRDB! Maybe you think I do not sound very helpful... But I'm listening, very carefully. Sometimes it takes time until one or two issues created by one or two users start making sense, and we can imagine a way to enhance the library for all users (not just a few) 😅

christianselig commented 3 years ago

Oh my gosh don't be silly, you've been super helpful and I completely understand! I appreciate you taking the time to respond so thoroughly, I know OSS can be a thankless job at times.