[Question] Store.items empty at startup

mergesort / Boutique

✨ A magical persistence library (and so much more) for state-driven iOS and Mac apps ✨

https://build.ms/boutique/docs

MIT License

899 stars 43 forks source link

[Question] Store.items empty at startup #16

Closed samalone closed 2 years ago

samalone commented 2 years ago

I've started experimenting with Boutique after seeing Brent Simmons tweet about it...

I'm trying to implement a RemoteImage class for use with SwiftUI that caches the image data in Boutique and loads missing data from a URL in the background. It's all working, but I'm finding that even when the image data is stored in Boutique's disk storage, on app startup the Store.items field is empty. This causes my code to create new RemoteImage objects and re-download the URL data, even though the data is in the cache.

Is there some way to ensure that Store.items is valid at startup?

mergesort commented 2 years ago

Hey @samalone, glad you're checking out Boutique! I just added a warning to Boutique noting that it's not a good idea to put images into the Store. There are two reasons for this, one of which I suspect you're hitting on. The first is that storing images in Boutique will balloon your app's memory, because of the images being put into the in-memory Store, so instead of being able to hold thousands of smaller objects, you're probably only able to hold 100-200ish images, depending on the image size and your device/how much memory it has.

I realize that the demo app I built for MVCS is doing this, and I've added a warning there to match. It was something I hadn't considered when I was building the demo app which is a miss on my part, so I also plan on building an example that matches the usage expectations of Boutique better, one that doesn't store Images into Boutique. I've also considered seeing if there's a way to use a property wrapper to signal to Boutique that this is an object that should never be loaded into memory, but that's for another day.

Now to the computational performance you mentioned, can I ask what happens if you try to use a smaller object like a Codable struct, is it available on app launch? The is synchronous so it should load relatively quickly on launch, but I want to make sure it does and you're able to access store.items on launch.

As a side note for the image use case I can recommend using [Bodega](https://github.com/mergesort/Bodega], the library that. It's not as slick of a demo as I put out there for Boutique working with images, but it shouldn't be much code to build something that reads the downloaded files for the caching RemoteImage class you're trying to build.

Thanks again for trying out Boutique, hope this helps!

samalone commented 2 years ago

I wasn't worried too much about memory use, though I see your point. I'm working on a podcast player and the images are relatively small and tend to be re-used a lot. Although particular episodes can have their own art, they typically repeat a small number of images. I can see that I should probably use Bodega or some other solution for image caching that doesn't store the entire cache in memory.

I guess what I'm worried about is the part of Store(storagePath: URL, cacheIdentifier: KeyPath<Item, String>) where it says:

        Task { @MainActor in
            self.items = await self.allPersistedItems()
        }

If I understand correctly, that queues a task on the MainActor that won't execute until the main thread yields. So regardless of how fast allPersistedItems() is, the items array won't be populated until after the app's startup code runs.

That means the app has to be prepared for previously stored items to be missing, which may work for some apps but could cause trouble if other objects in the app are storing the IDs of the missing items and expect them to be there.

It would be nice to have a way to ensure the items are loaded. Perhaps an async init function would help? (I haven't thought this through entirely.)

samalone commented 2 years ago

It occurs to me that there might be some synergy between this startup issue and the discussion about using a SortedDictionary to index stored objects.

The only reason my startup code needs the items array initialized is that it needs to look up particular objects, not that it needs all objects.

So what if the items array was never actually constructed unless the app requested it? That would serve two goals:

The Store would not need to load all items from disk into memory unless requested, which would solve the memory issue.
The Store could provide a subscript operator to load a particular item, which would solve the startup issue.

mergesort commented 2 years ago

There's also a parallel discussion happening on #14, which I plan to start to chipping away at over the next couple of days. It's important for me to be able to accommodate all of these use cases and considerations that have been proposed over the last couple of days, and I'm glad to! I just need some time to think it over everything together, and consult with some other people who have good intuition for this.

Perhaps an async init function would help?

One of the potential changes is indeed an async init, that's actually the first thing I plan on trying out to see how it changes things.

The only reason my startup code needs the items array initialized is that it needs to look up particular objects, not that it needs all objects.

Your use case for items being available immediately makes sense, and that's definitely something I want to address. One concern I have with using a Dictionary in a way that directly is exposed to a user is that it would break the API, and would change the mental model of subscribing to an array in SwiftUI. (At least if I understand what you're suggesting). Does the approach you were thinking of account for, or is there possibly a way to do all of this with an array being exposed to the user, that way it can be plugged into people's apps in the same way?

So what if the items array was never actually constructed unless the app requested it?

Sorry if I'm missing something (it's been a long couple of days), how would this look in practice?

samalone commented 2 years ago

Swift 5.5 allows gettable properties to be async, so items could become an async computed property. The getter function could iterate over the filesystem, check the internal dictionary to see if items have already been loaded, and load just the items that are missing from the dictionary. Internally, you can set a flag when all items from disk have been loaded, and this getter function can simply return the items from the dictionary if they've already been loaded. This means you only need to examine the filesystem the first time the getter is called.

Another possibility would be to rely more on the @Stored property wrapper to provide the array of items, moving this out of the Store class itself.

In fact, if you move to a dictionary, it would be nice to have the @Stored property wrapper optionally take a where function that controls which keys should be loaded. This would allow a controller to only load the items that are related to another object.

There are probably other ways of solving this problem using Combine or AsyncStream, where a publisher exists but doesn't start loading until someone subscribes to the publisher.

mergesort commented 2 years ago

Swift 5.5 allows gettable properties to be async, so items could become an async computed property.

Unfortunately the first option isn't an option as computed properties can't have property wrappers attached to them, it was actually the first thing I tried when I was just starting to work on Boutique.

You can see that by trying to write some code of this shape, and seeing that it produces this error. Property wrapper cannot be applied to a computed property, which makes sense.

@Published public var images: [RemoteImage] {
    get async {
        self.imagesDictionary.values.elements
    }
}

In fact, if you move to a dictionary, it would be nice to have the @Stored property wrapper optionally take a where function that controls which keys should be loaded.

I'm in the middle of prototyping using a private OrderedDictionary for performance/uniquing reasons right now, but I don't really wish to expose that data structure to the caller. I do like the idea of deciding what loads initially as you're proposing but then the memory and disk stores will be out of sync. In your version would the two ever rectify? If so, how, and when would that happen?

There are probably other ways of solving this problem using Combine or AsyncStream, where a publisher exists but doesn't start loading until someone subscribes to the publisher.

I'm also subsequently prototyping converting my Combine code to AsyncStream so I can guarantee the order of execution, but that's not going nearly as well since I'm not very familiar with AsyncStream. 😅

samalone commented 2 years ago

Yes, but all that @Published does is perform an objectWillChange.send() before changing the value. It's not hard to call that when the internal state changes, which will have the same effect as @Published.

mergesort commented 2 years ago

Well it does one more thing, @Published exposes an $items Publisher on items which is a very important part of the API that Boutique offers. I could expose a separate Publisher, but I'd also prefer not to because I really think the fact there's only one property exposed is a nice selling point of Boutique, and it would be a breaking change as well.

samalone commented 2 years ago

I agree that you should not expose your private dictionary directly to the caller. It is easy enough to provide a custom subscript operator.

In terms of maintaining consistency between the memory store and the disk store, I think you just need to loosen the relationship between the two. Rather than the memory store being a copy of the disk store, think of it as a write-through cache. The only time the in-memory cache would match the disk store would be if the caller issued a command to load all objects, (such as by calling the items getter).

samalone commented 2 years ago

Well it does one more thing, @Published exposes an $items Publisher on items which is a very important part of the API that Boutique offers. I could expose a separate Publisher, but I'd also prefer not to because I really think the fact there's only one property exposed is a nice selling point of Boutique, and it would be a breaking change as well.

OK, fair enough. But I think maintaining backward compatibility is going to be a heavy constraint on the development of the library. You may want to start thinking about what improvements you can make in version 1.x and what improvements will require breaking changes in 2.x.

To me, the issue is with the items array. If it is a complete copy of the disk store, then you are requiring the entire disk store to be loaded into memory. And if it is stored in a property and published rather than computed on-demand, then you are loading the entire disk store at the launch of the app. Those requirements are going to rule out a lot of use-cases of the library.

But maybe those constraints match your vision of Boutique. Another library could always offer a different set of choices.

mergesort commented 2 years ago

@samalone I think I've found a way to bridge the gap between what you're asking for to make Stores launch quickly, and what I want in keeping an API I consider essential to ensuring that Boutique is simple and accessible to beginners.

I'm going to start a separate Discussion on it in the afternoon after I take care of some personal matters, because I also want to involve a couple of other people who have been working on related patches and giving other API and performance-related suggestions.

I hope you know that I really appreciate this input, it's helped me think about and solidify some of my core principles around Boutique, and it makes me genuinely happy to see people show appreciation for and want to use something I put a lot of care into.

P.S. I'm going to convert this to a Discussion, just for organization purposes.