owntone / owntone-server

Linux/FreeBSD DAAP (iTunes) and MPD audio server with support for AirPlay 1 and 2 speakers (multiroom), Apple Remote (and compatibles), Chromecast, Spotify and internet radio.
https://owntone.github.io/owntone-server
GNU General Public License v2.0
2.1k stars 237 forks source link

Initial processing of large static playlists is slow #419

Closed adamdmoss closed 7 years ago

adamdmoss commented 7 years ago

I have about a dozen playlists, and four of them have thousands of playlist entries.

For example, one of the larger playlists has 7612 tracks and takes 7m42s to load. A playlist with 1395 tracks takes 1m22s to load, which is approximately the same number of entries processed per second so at least it's a linear cost. :)

(My hardware is a raspberry pi 3.)

This isn't a showstopper, just a mild annoyance. It takes about 30m for forked-daapd to scan my static playlists (on top of the 7m it takes to scan the library).

ejurgensen commented 7 years ago

Need to understand where it takes long to load - is it in Remote? or in iTunes?

adamdmoss commented 7 years ago

Ah no, it's the playlist-processing part of forked-daapd startup which is particularly slow (about 30m). Once the playlists are processed by forked-daapd, access with 'Remote' is almost instant.

ejurgensen commented 7 years ago

Alright. I think most of the time is spent on db read/writes, and it likely wouldn't be too difficult to optimize on that.

@chme, what do you think? Maybe a few of the reads could be put together, and maybe also using transaction chunks would improve it a lot?

chme commented 7 years ago

Yes, transactions should help in this case. Reducing the read operations seem to be difficult (after a very short look into the code). One possible improvement could be to try to skip processing unchanged playlists (i guess they rarely change). Scanning a playlist file always deletes all playlist items from the db and readds them while processing the playlist file. If a playlist is unchanged (mtime < db_timestamp), we could skip reading the file (with special handling for streams and a check for missing playlist items).

ejurgensen commented 7 years ago

I took a stab at improving the playlist scan times in this branch: https://github.com/ejurgensen/forked-daapd/tree/improve_plscan1

@adamdmoss do you have the opportunity to build from the branch and test it? I only have small playlists to test on.

@chme, I also did the skipping of unchanged playlists like you suggested, but I may have forgotten the "check for missing playlist items". Did you mean cleaning the playlistitems table for references to items that are no longer present in the files table?

adamdmoss commented 7 years ago

Thanks @ejurgensen ! Good news: The skipping of unchanged playlists works well and is good Maybe not so good news: It still takes the same amount of time to load changed playlists (30min - tested by 'touch'ing the files)

ejurgensen commented 7 years ago

Thanks for the quick test. I don't understand how the database transactions don't shave off at least some of the scan time. I will try to make a large playlist using cut+paste and check what is the bottleneck.

ejurgensen commented 7 years ago

I've tested with a playlist of 1548 items (on a RPi2):

  1. Without transactions + db is on the sd card: 68 seconds
  2. With transactions + db is on the sd card: 18 seconds
  3. Without transactions + db is in memory (/run): 19 seconds
  4. With transactions + db is in memory (/run): 16 seconds

So that is a totally different result than yours, and I have no idea why. On my setup, it seems clear that the bottleneck is database disk read/write, and that the transactions significantly reduce that.

Could you try placing your db in /run and see what kind of scan times you get?

LordMyschkin commented 7 years ago

Me too noticed that many playlists slow down startup - for me I thought that is because my playlists contained many „dead links“, e.g many entries without corresponding song in the database. Is there some kind of timeout when scanning playlist entries for metadata?

ejurgensen commented 7 years ago

The playlist scanner does not look for metadata (except for urls/radio stations, where there is a timeout), it only attempts to locate the files referenced by the playlist. A timeout isn't relevant for that. In many cases it is just one or two database lookups followed by a write, if the item was found.

chme commented 7 years ago

@ejurgensen yes, I ment deleting playlist items that are no longer in the files table (or are pending to be purged). One way to do it, would be to add a delete query to the purge cruft function.

A playlist entry that does not point to a valid file will result in a lot of unnecessary queries. This is due to forked-daapd trying to accept playlists created on a different computer (with different paths). If we would only accept valid playlist entries, this could be optimized. Another potential optimization would be to only rescan URLs if they are not already in the files table. (This assumes that there is never the need to rescan an URL)

ejurgensen commented 7 years ago

A playlist entry that does not point to a valid file will result in a lot of unnecessary queries.

That will only result in one query.

It is only if you have a valid entry where the file name matches multiple files in different directories because they are similarly named (e.g. 'foo/1.mp3' and 'bar/1.mp3') you get more than one more query. However, the current search may not work well with indeces, since it matches by the end of the string.

I will add playlistitems cleanup in purge like you suggest, and it is also a good idea to not rescan urls. I think the latter was already there, but I mistakenly removed it in this branch.

ejurgensen commented 7 years ago

@adamdmoss I've now done some more thorough improvements. The scan time for my 1548 item playlist is now down to just 1-3 seconds, also when the db is on the sd card! The significant improvement comes mainly from better use of indices. It would be great if you would try again - the branch is still: https://github.com/ejurgensen/forked-daapd/tree/improve_plscan1

Optimising is actually quite enjoyable, but one thing took another and I ended up changing a bunch of other stuff too. Hopefully my lack of restraint didn't produce new bugs. If you encounter something, please let me know.

ejurgensen commented 7 years ago

Any chance you will be able to test it during the next few days, @adamdmoss? I'm of course curious about the performance, but also would like the testing before I merge it.

ejurgensen commented 7 years ago

Well, it's merged into master now, so I'll close the issue. If you get to try it, please share your findings.