Closed adamdmoss closed 7 years ago
Need to understand where it takes long to load - is it in Remote? or in iTunes?
Ah no, it's the playlist-processing part of forked-daapd startup which is particularly slow (about 30m). Once the playlists are processed by forked-daapd, access with 'Remote' is almost instant.
Alright. I think most of the time is spent on db read/writes, and it likely wouldn't be too difficult to optimize on that.
@chme, what do you think? Maybe a few of the reads could be put together, and maybe also using transaction chunks would improve it a lot?
Yes, transactions should help in this case. Reducing the read operations seem to be difficult (after a very short look into the code). One possible improvement could be to try to skip processing unchanged playlists (i guess they rarely change). Scanning a playlist file always deletes all playlist items from the db and readds them while processing the playlist file. If a playlist is unchanged (mtime < db_timestamp), we could skip reading the file (with special handling for streams and a check for missing playlist items).
I took a stab at improving the playlist scan times in this branch: https://github.com/ejurgensen/forked-daapd/tree/improve_plscan1
@adamdmoss do you have the opportunity to build from the branch and test it? I only have small playlists to test on.
@chme, I also did the skipping of unchanged playlists like you suggested, but I may have forgotten the "check for missing playlist items". Did you mean cleaning the playlistitems table for references to items that are no longer present in the files table?
Thanks @ejurgensen ! Good news: The skipping of unchanged playlists works well and is good Maybe not so good news: It still takes the same amount of time to load changed playlists (30min - tested by 'touch'ing the files)
Thanks for the quick test. I don't understand how the database transactions don't shave off at least some of the scan time. I will try to make a large playlist using cut+paste and check what is the bottleneck.
I've tested with a playlist of 1548 items (on a RPi2):
So that is a totally different result than yours, and I have no idea why. On my setup, it seems clear that the bottleneck is database disk read/write, and that the transactions significantly reduce that.
Could you try placing your db in /run and see what kind of scan times you get?
Me too noticed that many playlists slow down startup - for me I thought that is because my playlists contained many „dead links“, e.g many entries without corresponding song in the database. Is there some kind of timeout when scanning playlist entries for metadata?
The playlist scanner does not look for metadata (except for urls/radio stations, where there is a timeout), it only attempts to locate the files referenced by the playlist. A timeout isn't relevant for that. In many cases it is just one or two database lookups followed by a write, if the item was found.
@ejurgensen yes, I ment deleting playlist items that are no longer in the files table (or are pending to be purged). One way to do it, would be to add a delete query to the purge cruft function.
A playlist entry that does not point to a valid file will result in a lot of unnecessary queries. This is due to forked-daapd trying to accept playlists created on a different computer (with different paths). If we would only accept valid playlist entries, this could be optimized. Another potential optimization would be to only rescan URLs if they are not already in the files table. (This assumes that there is never the need to rescan an URL)
A playlist entry that does not point to a valid file will result in a lot of unnecessary queries.
That will only result in one query.
It is only if you have a valid entry where the file name matches multiple files in different directories because they are similarly named (e.g. 'foo/1.mp3' and 'bar/1.mp3') you get more than one more query. However, the current search may not work well with indeces, since it matches by the end of the string.
I will add playlistitems cleanup in purge like you suggest, and it is also a good idea to not rescan urls. I think the latter was already there, but I mistakenly removed it in this branch.
@adamdmoss I've now done some more thorough improvements. The scan time for my 1548 item playlist is now down to just 1-3 seconds, also when the db is on the sd card! The significant improvement comes mainly from better use of indices. It would be great if you would try again - the branch is still: https://github.com/ejurgensen/forked-daapd/tree/improve_plscan1
Optimising is actually quite enjoyable, but one thing took another and I ended up changing a bunch of other stuff too. Hopefully my lack of restraint didn't produce new bugs. If you encounter something, please let me know.
Any chance you will be able to test it during the next few days, @adamdmoss? I'm of course curious about the performance, but also would like the testing before I merge it.
Well, it's merged into master now, so I'll close the issue. If you get to try it, please share your findings.
I have about a dozen playlists, and four of them have thousands of playlist entries.
For example, one of the larger playlists has 7612 tracks and takes 7m42s to load. A playlist with 1395 tracks takes 1m22s to load, which is approximately the same number of entries processed per second so at least it's a linear cost. :)
(My hardware is a raspberry pi 3.)
This isn't a showstopper, just a mild annoyance. It takes about 30m for forked-daapd to scan my static playlists (on top of the 7m it takes to scan the library).