spezifisch / stmps

Subsonic Terminal Music Player S
GNU General Public License v3.0
16 stars 6 forks source link

[Feature] Implement image cache max-sizing #62

Open xxxserxxx opened 2 weeks ago

xxxserxxx commented 2 weeks ago

The current album art cache can grow indefinitely. This feature would add cache management, such that a configurable maximum number of cache items are held at any time.

Two algorithms occur to me:

LRU

When the cache size exceeds the maximum, the algorithm should purge images for songs that firstly are not in the current queue, and secondly, are the oldest images in the cache.

Unused

Simply, when images are no longer referenced by any song in the queue, remove the image from the cache.

I think the second is more simple to implement (we would not need to consider time stamps), and would probably be sufficient. There's no reason to expect that LRU would prevent re-fetches.

spezifisch commented 2 weeks ago

You are referring to an on-disk cache, right?

I'll add this to milestone v1.0.0 as I don't want users to someday find we have been eating all their data. :-)

xxxserxxx commented 2 weeks ago

You are referring to an on-disk cache, right?

No. The connection structure now has a memory cache for images it pulls from the server. I explained in the pull that it's cached there because the http call needs to interpret the mime-type response -- I could cache the data elsewhere, but it was more straightforward to cache it in the connection. In any case, it's a memory cache -- I'm not caching anything to disk on the client side.

What I would like to do -- based on the recent discussion about the performance issues you're seeing -- is:

  1. Run a persistent image loader that's responsible for loading images in the background
  2. Keep the current GetAlbumArt function, except change it to return the "logo" on cache misses
  3. Have the concurrent thread update the UI whenever a new image is finished being fetched, and refresh the image view
  4. Have the image loader also clear the cache of any images that aren't being used by items in the queue

I don't bellieve an LRU will result in fewer cache misses, because I doubt user behavior will be that stochastically predictable. I suspect users will either load several songs from an album and then move on, or be entirely random; in either case, LRU won't provide much benefit, and LRUs require a bit of code complexity and additional data storage (for time stamps).

spezifisch commented 2 weeks ago

In any case, it's a memory cache -- I'm not caching anything to disk on the client side.

Thinking of that, I haven't tried running stmps on my Raspberry Pi with 512 MB RAM yet. Anyway, the other music players I know of (Clementine/Strawberry, Supersonic) all use image caches on disk, i.e.:

$ ls ~/.cache/supersonic/*
artistimages  covers

$ ls -1 ~/.cache/supersonic/**/*.jpg | wc -l
9687

$ du -hs ~/.cache/supersonic
78M ~/.cache/supersonic

Granted, we don't need to cache 10k cover files - or should we? We could effectively get rid of a lot of network requests, which would be relevant if I were connecting to my Subsonic server remotely through a VPN.

Just food for discussion, I think.

xxxserxxx commented 2 weeks ago

What's your chat platform of choice? IRC? Matrix? I don't do Discord. We should have a chat room, though, even if we're not often there at the same time.

Moving the cache to disk would certainly save memory, and allow larger cache sizes. On almost-embedded devices, we still probably don't want to be disk-caching large amounts of data, as they tend to also be constrained on persistent storage as well. But we could cache more, and put less load.

I will point out that keeping local persistent data caches in sync with server data tends to get complex pretty quickly, with logic necessary to evaluate whether the cache is out of date. Consider when a user changes the album art on the server: AFAICT, the Subsonic API provides no Last-Modified attributes for assets; you literally have to download the asset and compare it to what you have, to invalidate the cached item. This could be worked around by a manual "refresh" operation, but then we're just basically clearing the local cache and re-downloading everything.

I'm not saying it's a bad idea; it may even be mandatory if we want to provide support for micro-computers with highly constrained resources. An alternative would be to allow the user to disable album art loading entirely -- which could be accomplished by letting them set the max-cache-size configurable option on the as-yet-to-be-implemented cache cleaning feature to 0. If 0, never load images.

I'll go on record as saying I'm bearish on disk caching. In one of my other projects, which is a mobile app and uses persistent caching heavily, the caching logic (storage, in/validation, refreshing) constitutes a large percentage of the code base, is surprisingly challenging to have function correctly, and requires good API support for timestamps -- which I believe is lacking in the Subsonic API.

spezifisch commented 2 weeks ago

What's your chat platform of choice? IRC? Matrix? I don't do Discord. We should have a chat room, though, even if we're not often there at the same time.

I don't really like chats for non-realtime development to be honest. It's yet another window/tab to keep open. I think Github's discussion/issue features could be adequate enough for now. We're then losing in the points of vendor lock-in but gain some project transparency, and not another account is required to participate. Just my 2 Cents.

I'll go on record as saying I'm bearish on disk caching. In one of my other projects, which is a mobile app and uses persistent caching heavily, the caching logic (storage, in/validation, refreshing) constitutes a large percentage of the code base, is surprisingly challenging to have function correctly, and requires good API support for timestamps -- which I believe is lacking in the Subsonic API.

Ok, so the problem here is complexity. Is guess then it's a bad idea for me to say that we then should probably offer both and make it configurable.

So to summarize:

xxxserxxx commented 2 weeks ago

I don't really like chats for non-realtime development to be honest.

No sweat. I haven't quite come to terms with github's discussions; I can't seem to get it integrated, and often lose track of threads. Ticket comments work fine.

Ok, so the problem here is complexity. Is guess then it's a bad idea for me to say that we then should probably offer both and make it configurable.

I mean, I think it's fine to think we should do a thing; I'm just cautious about the complexity. If it's necessary, it's necessary. I do think we should discuss the trade-offs, though, and options like simply allowing users to disable images. It's simple to implement and maintain, completely sidesteps any network latency and memory issues that might be encountered on micro-computers, and has only the consequence that a feature of questionable value isn't available. I mean, it's not like cover art in a TUI tool is going to be anyone's mandatory requirement.

So to summarize:

So my assumption that constrained devices are going to be constrained in storage is probably a bad one.

Is it OK if we put persistent caching in a future ticket? I really think we're going to encounter some hard limitations when it comes to cache invalidation -- limitations that we have already with an in-memory cache, but which disappear with quit/load, and which I was already planing on further restricting with cache invalidation based on queue contents. We'll have to nail down assumptions and expected behaviors before implementing a persistent cache.

spezifisch commented 2 weeks ago

Is it OK if we put persistent caching in a future ticket?

That will work. Though for persistent caching I would probably use https://github.com/gokyle/filecache and a simple added-on pruning logic (maybe check atime and delete oldest -- iff the user doesn't have noatime in their mount options).

xxxserxxx commented 2 weeks ago

That library is a good idea. The troublesome part will be ensuring that the cached images are in sync with the server. I wonder if Navidrome (and gonic, and all of the others) change the asset ID if the user replaces the file on disk & rescans the library? Like, the image will always be called "cover.jpg" or "album.jpg" or something like that, so the name won't change; but if the data stored in the file is changed, will Navidrome give it a new ID? If that's a reliable behavior, we could use that to detect, invalidate, and re-fetch cached assets.

xxxserxxx commented 1 week ago

I believe this is addressed by #66, and can be closed when that's merged. Implementing a persistent cache should be easier with #66, which abstracts the cache behavior, and which could be replaced with filecache; I'm going to create a separate ticket for that.