nukeop / nuclear

Streaming music player that finds free music for you
https://nuclear.js.org/
GNU Affero General Public License v3.0
12.01k stars 1.04k forks source link

Local file problems with incomplete mp3 tags #331

Closed Dola-Shuvi closed 4 years ago

Dola-Shuvi commented 5 years ago

Expected behavior: Either ignores untagged files or uses the filename as an alternative to the title tag.


Steps to reproduce:

  1. Add any folder with untagged mp3 files to the library.
  2. Scan the folder for files.
  3. Attempt to play undefined song.

Possible solution: Add a check if the title tag exists and if it does not, use the filename as the title.


Undefined song without mp3 tags No undefined song with mp3 tags Devtools output when trying to play undefined song

nukeop commented 5 years ago

Thanks for reporting this. One way of solving this could be using the filename (or the entire path) to distinguish untagged files. We could also think about using Musicbrainz to automatically tag music for convenience.

@charjac for reference

Dola-Shuvi commented 5 years ago

2daa65d1fe32c059f1dd14d8e6ef3e01907c0cd4 seems to have broken playback of local files for me. It seems no valid streams are passed to Nuclear regardless of files being properly tagged or not. This line seems to be the cause as it passes undefined as its response. https://github.com/nukeop/nuclear/blob/2daa65d1fe32c059f1dd14d8e6ef3e01907c0cd4/server/http/api/localfile.js#L87

Devtools output Network capture

Files without any tags do show their name properly in the library now however.

c-jacquin commented 5 years ago

I will investigate this later today, thx for reporting.

c-jacquin commented 5 years ago

ok i see where the problem is i will push a fix today

c-jacquin commented 5 years ago

395 should fix this (the good old NullPointerException)

Dola-Shuvi commented 5 years ago

https://github.com/nukeop/nuclear/blob/2daa65d1fe32c059f1dd14d8e6ef3e01907c0cd4/server/local-files.js#L59 This should be data.recordings and data.recordings.length or it will never pass the condition.

c-jacquin commented 5 years ago

oops.... thx, i just push the fix

Dola-Shuvi commented 5 years ago

I just tested the pull request with my local library and Nuclear crashes after roughly 30 seconds of scanning it. My config.json file grew from under 1 MB to 173 MB due to the cache being dumped into it. Perhaps a different way of storing the cache has to be explored.

nukeop commented 5 years ago

what's the approximate size of your library?

Dola-Shuvi commented 5 years ago

Around 1300 tracks. This is probably way above anything that was tested.

nukeop commented 5 years ago

I'm gonna try some stress testing with my 30GB library. We could think of ways to speed this up. Fun engineering challenge.

Dola-Shuvi commented 5 years ago

Perhaps something like level combined with filtering out tracks that contain the necessary data for Nuclear to work. By using a database like level it would also be possible to easily remove metadata of tracks that are no longer in the library. The following scans on startup could possibly also be shortened by simply checking if the track is already in the database and then skipping it.

Dola-Shuvi commented 5 years ago

395 seems to be a massive speedup when metadata is stored compared to before. Getting to that point however seems very difficult as Nuclear still crashes when loading too many tracks. Somewhere the cache is still getting dumped into config.json. I believe a new electronStore needs to be created in store.js

Dola-Shuvi commented 5 years ago

Quick update on this issue:

I experimented a bit with the type of metadata that is saved in the cache. It turns out that the most data is used to store images as Base64. When I commented out the image element in https://github.com/nukeop/nuclear/blob/75e14ec94a3cb2dd8fa2562fae78f36ddd57c939/server/local-files.js#L26 my config.json shrank from ~140 megabyte to ~2 megabyte when using my full 1300 track library. I think there should be a way to not store the thumbnail data and only keeping it in memory. I tried multiple solutions but it always ended up being written to disk. Perhaps someone more experienced in JS than me can easily solve the issue.

nukeop commented 5 years ago

I think this will be best served by a binary format or alternatively storing this in an sqlite or a similar database. Thumbnails can be loaded on the fly from the disk anyway.

Dola-Shuvi commented 5 years ago

Another update:

I experimented with LibraryViewContainer and got it to somewhat work. There is one major freeze (~5 seconds with my 1300 track library) whenever the LibraryView is opened and images only load when the library gets sorted or the library is opened a second time. Refreshing the library causes the images to unload again.

Perhaps someone can use this very ugly hack as a basis to write a better fix.

EDIT: Rewrote the hack properly. It loads much faster now but sadly still crashes at my full library. Looks like this might be a NodeJS limitation.

nukeop commented 5 years ago

We can use virtualized components to prevent the freeze, I think. I'll give it some more thought later today, and analyze your code.

Venryx commented 4 years ago

Local file playback also fails on my full library. The problem for me is that apparently it is just too large for Nuclear to process in a reasonable timeframe (I waited for over 10 minutes, then gave up).

The main problem appears to be that Nuclear is trying to run "fpcalc" on every file in the library, before the files are even being played, queued, or even visible. I suppose this is so that it can identify duplicates, and thus only list them once?

Anyway, one way or another we need to remove the need for fpcalc to run on every file before the local library is playable: it takes far too long in large libraries, making the program unusable.

Can the purpose of fpcalc being run on library-scan be explained so we can think of solutions?

Venryx commented 4 years ago

Ah okay, it appears from acousticId.js that the fingerprinting is used to obtain metadata about the song, I suppose for then retrieving the album image and such for songs where it isn't embedded.

However, there needs to be a way to access your local library prior to this processing. I'll try cutting out the fpcalc operations, and see if I hit other issues with my local library.

Dola-Shuvi commented 4 years ago

As you have already figured out fpcalc is used to fingerprint the local audio files and get their metadata to prevent the library from breaking if files are untagged. Nuclear then saves the metadata so it shouldn't try to scan them all again. I can currently load a library with 1260 songs without any issues and there is barely any wait time when running Nuclear. Could you perhaps share the amount of songs you are trying to load in your library?

Venryx commented 4 years ago

Good news: When I cut out this for block in local-files.js, my local library loads almost instantly! I can then play local files.

However, there are other "soft" issues hit with a local library of that size (7,240 tracks):

[]()1) The UI for the song list is slow to scroll, because all of the tracks are immediately rendered to it. (EDIT: Actually, scrolling is okay before playing a song, but once a song is playing it slows to a crawl. Could it be that Nuclear is re-rendering the entire song list each time the track-play progress bar is updated!?)

Like mentioned above, using a virtualized list would help with that. I've used and recommend react-list. Does it seem like a fine choice? If so, I can work on implementing it.

[]()2) ~Some of the tracks show up with the name "unknown", despite the filename containing the correct track title. In cases where the metadata doesn't contain the track name, Nuclear should just use the filename as the display name.~ (EDIT: Nevermind, I was looking at the "Artist" column instead of the track-name column. Also, I had accidentally commented out the filename-fallback when I commented out the fingerprinting code)

Dola-Shuvi commented 4 years ago

As you can see in this code block on line 78 https://github.com/nukeop/nuclear/blob/b7a2bfcd2cbf0c24fb3240a79cebc079073c85e3/packages/app/server/local-files.js#L72-L79 if Nuclear can't find any metadata for the file it falls back to using the filename. The unknown track title causes some severe issues when trying to play files so commenting out that part would not be recommended.

Venryx commented 4 years ago

@Mongogamer I don't see how your library is fingerprinting all your local audio files in anywhere close to a fast enough time. For me, it appears to take about a second to fingerprint each audio file, as seen by watching the process launches in Process Hacker 2.

Probably part of this is that I have my audio library stored on a hard-drive rather than a solid-state drive. It makes sense that it would slow the process down, but there still shouldn't be that much of a time difference.

~Actually, on looking closer, it appears the fpcalc process is not itself what's taking so long. Rather, the processing that occurs between each fpcalc process appears to be what's slow. And it appears to be IO-bottlenecked, not cpu-bottlenecked:~ (EDIT: Nevermind, this doesn't make sense since commenting out that section made the library load quickly. Will look closer...)

Notice that the CPU usage is low, but the IO rate is ~1.2 MB/s for two of the electron processes -- and it stays that way throughout the scanning process, until it hits this error:

 main › Error: ERROR: Empty fingerprint

    at eval (webpack:///./server/lib/fpcalc.js?:84:28)
    at ConcatStream.eval (webpack:///./node_modules/concat-stream/index.js?:37:43)
    at ConcatStream.emit (events.js:187:15)
    at finishMaybe (webpack:///./node_modules/concat-stream/node_modules/readable-stream/lib/_stream_writable.js?:620:14)
    at endWritable (webpack:///./node_modules/concat-stream/node_modules/readable-stream/lib/_stream_writable.js?:629:3)
    at ConcatStream.Writable.end (webpack:///./node_modules/concat-stream/node_modules/readable-stream/lib/_stream_writable.js?:567:22)
    at Socket.onend (_stream_readable.js:628:10)
    at Object.onceWrapper (events.js:273:13)
    at Socket.emit (events.js:187:15)
    at endReadableNT (_stream_readable.js:1092:12)
Dola-Shuvi commented 4 years ago

My library is stored on a hard-drive as well so this is most likely not the limiting factor. Two possible solutions come to mind, one more of a temporary stopgap and one possible permanent improvement. Try adding and scanning library folders individually instead of all at once. If this doesn't help then the process of scanning files might have to be parallelized.

Venryx commented 4 years ago

Okay, so I tried updating the for-loop to the following:

  let promises = formattedMetas.map(async meta => {
    if (!meta.name) {
      let data;
      try {
        [data] = await fetchAcousticId(meta.path);
      } catch (ex) {
        console.error(ex);
      }
      if (data && data.recordings && data.recordings.length) {
        meta.name = data.recordings[0].title;
        meta.artist.name =
          data.recordings[0].artists[0].name || 'unknown';
      } else {
        meta.name = path.basename(
          meta.path.split('.').shift()
        );
      }
    }
  });
  await Promise.all(promises);

This greatly improves the speed of the library scanning. It still takes about 3 minutes, but at least it completes.

Note that the try-catch is necessary to keep the "Empty fingerprint" error from stopping the whole process, and the await Promise.all allows the scanning of the files to occur in parallel. (parallel IO scanning is not that great on a hard-drive, but apparently it's still better than sequential!)

Anyway, assuming the changes above are added, there is still the problem of a 3 minute scan occurring with no visual feedback to the user. Thus, regardless of other considerations, someone should update the "scanning" status indicator to include a %-complete indicator.

@nukeop I'm willing to work on this once it's decided what sort of indicator should be used. ("X%" label, a pie-chart that fills in as it completes, an "XDone/XTotal" label, etc.)

Venryx commented 4 years ago

To summarize, these are the changes that I think are needed: 1) Have a try-catch prevent a failed fingerprint attempt from stopping the entire library-scan operation. 2) Make the fingerprinting occur in parallel, to greatly speed it up. (~3 minutes for my library, instead of the 10+ otherwise) 3) Have the UI show the library-scanning process. 4) Have the song list use a virtual list so that it renders much faster. 5) Have the song list not get re-rendered whenever the track-progress indicator gets re-rendered. (I'm assuming this is the reason for the very slow song-list scrolling when a track is playing, anyway)

Steps 1 and 2 are accomplished by the previous post's code-block. Steps 3 through 5 I'm willing to implement, once I get the go ahead from @nukeop, and details on what virtual-list library to use, and what UI format to use for the scan-progress indicator.

nukeop commented 4 years ago

Yeaaah I know, the UI and performance for local library needs to be worked on. I have finished the plugins functionality so likely this will be my next area of focus. I was a MediaMonkey user years ago and its library management was excellent. I will try to make it as powerful as possible in Nuclear, so it's actually competitive as a normal desktop music player. If you have any wishes or features from other players that you'd like to see in Nuclear, let me know.

Venryx commented 4 years ago

@nukeop Sounds good.

Which of the five changes would be good for me to work on? (don't want to work on something if you prefer a different approach/library, or would rather implement it yourself)

Steps 1 and 2 are basically done. (just need to decide on whether/how to log fingerprinting errors, and probably implement a limiter so that the parallel fingerprinting only does X files at the same time)

Step 3 depends on your input on the UI styling.

And steps 4 and 5 need your input on what library to use, and how to solve the performance issue while a track is playing.

nukeop commented 4 years ago

It would be great if you could take care of 1, 2, and 3. We could use this component for showing progress: https://react.semantic-ui.com/modules/progress/ You'd just need to style it appropriately using the usual palette.

As for 4 and 5 I want to implement something more sophisticated than just a plain list, but I need to experiment a little first. I'm also thinking about caching the results in the memory, but that will probably be left for later.

Venryx commented 4 years ago

Noticed a problem while improving on step 2: The Acoustic ID service apparently only allows 3 requests per second. And this limit applies globally to anyone using Nuclear, since there's only one API key it uses.

It's not reasonable (imo) to require users to wait a third of a second for each track in their library to get sent to the acoustic ID servers to be identified, especially since this is flaky and can break if other users are scanning their libraries in Nuclear at the same time.

Currently, since I use concurrent processing, almost all the requests are just failing. Which actually is fine since the acoustic ID data is not necessary to obtain a filename for the tracks. However, it essentially makes the acoustic ID integration useless.

Anyway, I'm leaving the acoustic ID code in for now, but it's not very useful atm due to how many of the tracks it just fails on (due to quota exceeding). My preference is to either scrap the acoustic ID integration, or leave it in as an option, but have it "lazy load" the acoustic ID data instead of trying to do it all during the initial scan of the library. (which is required to even start using the local library)

Venryx commented 4 years ago

Okay, I've gotten steps 1 through 3 finished, and created a pull-request for it here: https://github.com/nukeop/nuclear/pull/531

Venryx commented 4 years ago

As for 4 and 5 I want to implement something more sophisticated than just a plain list, but I need to experiment a little first. I'm also thinking about caching the results in the memory, but that will probably be left for later.

What sort of UI do you have in mind?

I'm eager to start using Nuclear as my main player, but the performance of the local library UI is bad enough that I can't really do so yet. I don't need it to be that full-featured right now (for now I'm going to have my plugin take care of most of my needs), but it's hard to work around core performance issues.

Do you have a single library-page UI-layout you plan to support, or will there be multiple?

If multiple, will one of the views be a simple list view? If so, then perhaps I could implement that simple list view now (using a virtual list library like react-list), until you've experimented and figured out a good solution for the other viewing option(s).

(However, perhaps what you have in mind overlaps too much with a simple list, in which case I may still be able to help build the initial version, if some more details are given.)

nukeop commented 4 years ago

Long term, I want the library view to be every bit as convenient and robust as typical desktop players, and better than Spotify. I will be implementing at least the following features:

Would you like to try implementing any of these?

Venryx commented 4 years ago

I think feature/layout 4 (virtualized list of all tracks in a library) could be accomplished right now just by modifying the local-library UI to use a virtual-list component (like the one mentioned above, react-list). I'm using it in my project currently, and it works well, and is easy to integrate.

If I try integrating it and it resolves the performance issues, would that be sufficient to be merged? Or would other changes also be needed?

nukeop commented 4 years ago

Yes, sure. At the very least it would address some immediate problems, so it would be a great help.

Venryx commented 4 years ago

Okay, I'll get to work on that then!

Venryx commented 4 years ago

Okay, the main performance problems seem to actually be from that playing a track is continually causing one of the hooks in LibraryView to fire. It's firing so frequently that I think it's firing every time the track progress-bar needs updating, which of course kills performance when the library has 2000 items.

To fix it, I should be able to just track down which hook and dependency is changing each time, then use useMemo or something to keep its reference stable (since the data actually accessed by LibraryView is presumably the same between those fractions of a second).

Venryx commented 4 years ago

So at this line, it appears that the "UPDATE_PLAYBACK_PROGRESS" action is being dispatched, based on song progression.

In my experience, it's not a good idea to be using the Redux store for something that updates as frequently as a song's progress! In this case, there are at least dozens of components subscribed to the store, so it's triggering hundreds if not thousands of re-evaluations of those store-accessors per second.

For some reason, the LibaryView is also not being properly memoized, so it's performing substantial processing each time the action is dispatched. Now I could either try to fix the LibraryView component's memoization, or have the song-progress state stored in some other way, that has a lower performance footprint.

Is either route fine, or do you have a preference? (and/or insight to help me understand how to resolve the issue)

Venryx commented 4 years ago

Nevermind, issue 1 is not actually a problem of LibraryView not being memoized. The problem is that, even with memoization, having it rendered causes slowdown, because every one of the 2000 tracks is a component that is also subscribed through the store through react-redux's @connect decorator.

They're memoized as well, however, there is still some processing involved in doing the shallowEquals check within the connect wrapper, and when you have 2000 tracks, getting called dozens or hundreds of times per second, the performance impact apparently adds up!

Thankfully, this means the fix (or "bandaid" anyway) should be as simple as implementing that virtual list as mentioned before: tracks not in the current view will not be rendered, thus they won't add to the number of store subscribers and add that shallow-equals overhead.

nukeop commented 4 years ago

I think it's fine to store the progress there, since only a few components access that subtree of the store, but the main problem is that heavy actions are dispatched along with those updates. I'm not sure how it's handled at the moment because I didn't write it, but if possible it should be isolated in its own action and memoized/throttled. Great that you got to the bottom of this so fast.

Venryx commented 4 years ago

heavy actions are dispatched along with those updates

I tried to open the Redux dev tools to see exactly what actions were being dispatched, however it keeps crashing. Interestingly, it doesn't only crash when opening the library; it crashes even on a fresh launch while sitting on the home page. (screenshot here)

That makes it hard for me to see what actions need to be cleaned up. Do you know what the names of the heavy actions are?

If not, it's possible it's just the sheer number of the "UPDATE_PLAYBACK_PROGRESS" actions that are currently being dispatched; even subscribers that don't access that part of the store, still have some overhead from having their store-accessor code run + the shallowEquals call afterward. I suspect that's actually the explanation of the slowdown, though can't confirm atm. (if it is, it can be fixed either by not storing song-progress in the store, or using list virtualization to keep the number of subscribers to a sane level -- it's probably worth doing both, actually, but I can implement just the second for now)

nukeop commented 4 years ago

I have no idea honestly. I'd have to do some profiling to be sure. The first low hanging fruit optimization that comes to mind is to throttle playback progress updates to 1 per second (since that's the smallest unit we're displaying anyway).

Track rows in library view and everywhere else shouldn't need access to playback state. I'd look at the track row container and see what parts of the store it's accessing, that's where the culprit should be found.

We could look at defining better (more conservative) selectors for components that are rendered many times, as well as implementing shouldComponentUpdate where possible.

Venryx commented 4 years ago

I'm not sure how it's handled at the moment because I didn't write it, but if possible it should be isolated in its own action and memoized/throttled.

I looked at the code-path and currently it's not memoized/throttled at all. Having it only dispatch actions 10 times per second, say, would certainly help (assuming the Sound component from react-hifi natively emits the onPlaying event more frequently than that), though I'd argue it would still cause unnecessary performance drops, for something that you only "partially fix" by throttling the update speed (which also reduces the smoothness of the end-user visual experience -- seems preferable to have a non-redux way of passing the updates to the seek-bar component imo).

Venryx commented 4 years ago

Track rows in library view and everywhere else shouldn't need access to playback state. I'd look at the track row container and see what parts of the store it's accessing, that's where the culprit should be found.

But that's the thing, they aren't accessing playback state. But that doesn't mean their store subscription has no overhead. Any store subscription has overhead, even if the component doesn't actually access content from the changing parts of the redux-store tree. (because the react-redux connector function itself has a small overhead -- and when you multiply hundreds of dispatches per second, with thousands of subscribers, that small overhead becomes noticeable)

Here is the mapStateToProps function of TrackRow, for example:

function mapStateToProps (state, { track }) {
  return {
    streamProviders: track.local
      ? state.plugin.plugins.streamProviders.filter(({ sourceName }) => {
        return sourceName === 'Local';
      })
      : state.plugin.plugins.streamProviders,
    settings: state.settings,
    favoriteTracks: state.favorites.tracks
  };
}

No access to state.player (which is the only thing updated by "UPDATE_PLAYBACK_PROGRESS"), and yet the React profiler tool shows the render-time filled with renders of "TrackRow (memo)" (the higher order component that subscribes to the redux store).

We could look at defining better (more conservative) selectors for components that are rendered many times, as well as implementing shouldComponentUpdate where possible.

That wouldn't help in this case. shouldComponentUpdate is already applied to the TrackRow component by the TrackRow @connect higher-order-component (or rather, the HOC has shouldComponentUpdate applied, and just doesn't call updates on TrackRow unless the result of mapStateToProps is shallowly-changed from the previous result -- this is where the don't-render-TrackRow-itself optimization is applied, but this process itself has some overhead, which is what is adding up).

Venryx commented 4 years ago

By the way, I've had to grapple with performance issues with redux in the past, which is why I'm fairly quickly able to diagnose the source of the performance issue here (well, possible source anyway -- maybe there are even heavier actions being dispatched multiple times per second!).

In my project, I solved the performance issues by: 1) Not using the Redux store for things that change very frequently. 2) Limiting the number of subscribers to the store. (eg. subscribing just the list component, and passing a shared array of values to each of the children, who then access their "slice" of the data -- for "simple" children components, this is easy to do and helps performance a lot)

Those two optimizations gave tremendous improvements to my UI's responsiveness, and they didn't actually take much away from Redux's productivity gains. Redux is great for sharing state globally in a safe way, however it does have some overhead.

I've found it's best to use it for the 95% of data that isn't updated super-frequently, and then use more traditional/imperative pathways for transmitting the remaining 5%, frequent-update stuff.

Anyway, like I said, we can "get by" for now by implementing some of the other optimizations (virtual list component, throttling the song-progress action-dispatch, not giving every TrackRow it's own Redux subscription, etc.), but ultimately I still think it's a good idea to move the super-frequent track-seek data out of Redux.

In the meantime, I will work on that first optimization of using a virtual-list within the LibraryView component.

nukeop commented 4 years ago

I'm going to bed now but later this week and the next I'll definitely think about implementing your ideas. Maybe performance optimization needs to be turned into a separate issue. We've never really considered performance, but now that it's becoming a hindrance, we should focus on that I guess.