Open xxxserxxx opened 4 weeks ago
I have a couple of suspicions about the cause of this.
The first relates to something I noticed when working on background loading of playlists: if you make a bunch of API calls at the same time, both Gonic and Navidrome seem to get confused and one or more call will fail. I can sometimes replicate this by starting stmps connected to gonic -- which has worse server-side playlist caching than Navidrome and can take dozens of seconds to complete -- and while playlists are loading go to the search tab and run a bunch of searches. This will often cause the playlist loading goroutine to encounter a server call error, and fail to load all the playlists.
My second suspicion is that the BrowserPage may not be correctly updating the UI after it makes changes, meaning the pane doesn't get refreshed after it gets new contents. It may be that this hasn't been seen before simply because nobody's torture-tested it as aggressively; if this is what's happening, the cause is that the code is in the middle of updating the album/song column contents when the user moves to a new artist, and there's a race that the UI loses.
Update New, third, suspicion: sometimes when I R
refresh, the artist column jumps. Perhaps the indexing in the BrowserPage.artistList
gets out of sync.
I can't reliably reproduce this I notice it happening pretty reliably when navigating into an album and then back up to the parent. I think I now have an idea about where to introduce trace logging. As I said -- there are no errors in the logs when this happens, so I think it's going to be a matter of adding more logging code, checking edge cases, making sure we're updating things in the right goroutines, and maybe adding some extra callers to Redraw()
.
Also, I think given the fact that both Gonic and Navidrome exhibit the API call issue, the problem is probably in stmps. It may be that our timeouts are too short, and by overloading the server with multiple requests, we're getting call failures. It may be that we simply need to add some retry code -- most of these failures are transient, so retrying once or twice will probably usually end up succeeding.
I think this last point is independent of the ticket to be worked on separately; even if it's contributing to this issue, it's also happening in other areas of the program.
Possibly related bug, to reproduce:
Edit: fix order.
~I think I know what it is, and it's related to the PR for #11 "sort by disk number".~ It is not. See the comment 4 down. It's related to directory structure.
The sort code is sorting the artist entities, but is not updating the related cached artistIdList
. Whether or not that's the sum total of the cause, the code needs to be addressed because we have two data structures that are not encapsulated, and one can be changed without affecting the other. In this specific case, by sort, because sort sorts in-place.
I'll push a fix on a branch against main, but we should re-think some of these "keeping two arrays that are hard-linked by index, but without encapsulated access". stmps does this in several places.
Something that's really odd I've seen while debugging this: we have album IDs in the BrowserPage.artistIdList
, which doesn't seem right. It's easier to see with gonic, because gonic indexes things by type: artists all get IDs like "ar-NNNN" and albums get "al-MMMM", and I'm seeing "al-MMMM" in the artistIdList
. It's possible that it has something to do with the vague distinction Subsonic makes where it mixes in the concept of "directories" with artists and albums, but still.
Huh. I don't know, man. Maybe it isn't related; sorting the entities does change the order of the entity list, but I haven't yet found some path where a change in the order of those entity lists would affect the GUI interaction. I'm still looking at it. And, now that I'm thinking about it, I didn't add new sorts; I just added the sort-by-disk-first code to the existing sort function. I'm back to being clueless.
Also, we are calling sort() a lot. I'm almost positive, multiple times on the same structures. We sort Entities when we get the response back; we sort Entities when the user selects something; we sort Entities before we add them to the queue... we call sort.Sort
on Directory.Entities
6 times, and another twice in SearchPage
(one of which might be justified as being on the results of a new server query, but is probably not as it's probably being done by the API library). This needs to be looked at, IMO.
But, no... the issue isn't with the sorting; it's with the response cache. For some reason, the connection.directoryCache
is having empty structures put in it -- so it's not registering a cache miss, but is instead returning empty results.
Sorry about the constant comments here, but there's a lot of weirdness in stmps, related (I think) to the ambiguity of the Subsonic API and how different servers implement things.
So, what's happening is indeed the artist/album/directory confusion. Say we have artist X, who on the server is just a directory with songs underneath -- no subdirectory for albums. stmps fetches this in the list of artists, but when we navigate to the artist, it also tries to fetch it as an album. When it gets back no album name (because the Subsonic server is not looking at the metadata, but the directory structure), stmps overwrites the valid SubsonicResponse
entry in the cache with an empty one. Then, the next time we try to fetch it, there's a cache entry, but it's the zero value. Hitting R
to refresh re-fetches and re-caches the data, but if we scroll around it gets overwritten again.
This can be observed by adding a logger message to the SubsonicConnection.GetMusicDirectory()
and SubsonicConnection.GetAlbum()
functions, and then navigating to an artist who's laid out on the server as I described. I haven't checked specifically with Navidrome, but since I was seeing similar behavior, I suspect the same thing is happening.
So what to do? Well, for one, don't overwrite valid, non-zero cache entries with zero entries. I'm working on that now. There may be more to dig into here. Also odd is that the API does a lot of pointer reference/dereferencing that seems unnecessary. The cache is non-pointers, but all of the API calls are looking for pointers. So the code does funny things like: fetch a pointer from the connection, store it in the cache as a non-pointer, and then later fetch the non-pointer and return a pointer to it. This is causing a lot of unnecessary memory allocation all entirely behind the scenes -- the Subsonic library is basically doing a lot of object copying behind the scenes, and then returning pointers to the things.
It's replicable, but not in a way that's easy to describe. It seems that the song pane stops updating after some actions. I've seen it when stmps is connected to either gonic and Navidrome, so it isn't a server-dependent bug.
What happens is:
R
(refresh) will fix it temporarily.