Implement music metadata for tag/metadata parsing

lennyanders commented 3 years ago

I did not add other file types for now, but i got the metadata parsing working with music-metadata and a buffer polyfill.

netlify[bot] commented 3 years ago

✔️ Deploy Preview for heuristic-ptolemy-02be89 ready!

🔨 Explore the source changes: 3df92cd5301f09d5fe4094be4fcea159fdb23b98

🔍 Inspect the deploy log: https://app.netlify.com/sites/heuristic-ptolemy-02be89/deploys/611d0c64de0fc200079aa04d

😎 Browse the preview: https://deploy-preview-6--heuristic-ptolemy-02be89.netlify.app/

lennyanders commented 3 years ago

I now also tested with more file types (['aac' ,'mp3' ,'ogg' ,'wav' ,'flac' ,'m4a']) and my PC firstly almost crashed, and then the tab got destroyed. Probably because of too much memory/CPU consumption. In my own test project, I did the tag scanning one after another to avoid that. I guess you use something like Promise.all, which I did firstly too. I didn't look that deep into your code for now.

I have 3554 audio files (I have 3556 for real, but the Chrome directory API for some reason leaves two out ¯_(ツ)_/¯) btw, and most of them are FLAC, so I guess for most people this shouldn't be a problem.

Borewit commented 3 years ago

I have 3554 audio files (I have 3556 for real, but the Chrome directory API for some reason leaves two out ¯(ツ)/¯) btw, and most of them are FLAC, so I guess for most people this shouldn't be a problem.

Make sure parsing is not done in parallel.

FLAC should be not problem. In case you are tranfering the data via HTTP, this can be optimed via a solution like @tokenizer/s3 which is only retrieving the parts reqired and not the entire file. But it makes things more complex.

minht11 commented 3 years ago

@lennyanders I never tested with that many files at most I used 200 😄.

In this code block I am parsing all files at once and waiting for the result with Promise.all https://github.com/minht11/local-music-pwa/blob/f3889d63677f6253cc0952d5c16cd0ab0f39b6bb/src/helpers/tracks-file-parser/worker/tracks-file-parser-worker.ts#L90-L107

An obvious solution would be to just use for await loop instead.

lennyanders commented 3 years ago

Should I update the PR to parse the files in a simple loop and scan more file formats? I already tested a similar thing, and it worked with my music library. It took long and still ate a lot of CPU, but it worked.

minht11 commented 3 years ago

I think parsing slowly is preferable to crashing, in the future it we should test using more than one worker, that might help a bit, depending on the device.

Anyway yeah you should update PR with file formats and loop change, thanks again.

Borewit commented 3 years ago

Should I update the PR to parse the files in a simple loop and scan more file formats?

Initializing music-metadata times the number of files in your memory is asking for lots of memory. You need to keep that under control, in principal no problem with running a few times in parallel. The overhead caused by that will result in increased CPU usage. So yeah, I always recommend to for simple serial loop.

If that takes very long, the data access is likely very inefficient. Try to loop with Node.js through your 3554 files, using parseFile to get an impression what the parsing speed should look like.

lennyanders commented 3 years ago

Updated it to parse more file formats and these in sequence. It takes around 10 seconds for my collection which I think is ok, but in Node it's more than 3 times faster. After the file parsing, the PWA is unresponsive for a few seconds. The first load also takes way longer, so I guess it's indexed db's fault, which sadly is really slow and blocks the main thread (I think even when called in a worker). I guess this is a hard problem to solve, and we should live with it like that for now. The problem also should also first be noticeable if you load more than 1000 songs, I think. With my 600-ish mp3 files, I had no problem previously.

minht11 commented 3 years ago

@lennyanders @Borewit Thank you both, for spending your time on this.

@lennyanders I'll open an issue on data loading performance where we can discuss this further.

Borewit commented 3 years ago

Updated it to parse more file formats and these in sequence. It takes around 10 seconds for my collection which I think is ok, but in Node it's more than 3 times faster. After the file parsing, the PWA is unresponsive for a few seconds. The first load also takes way longer, so I guess it's indexed db's fault, which sadly is really slow and blocks the main thread (I think even when called in a worker). I guess this is a hard problem to solve, and we should live with it like that for now. The problem also should also first be noticeable if you load more than 1000 songs, I think. With my 600-ish mp3 files, I had no problem previously.

That all sounds realistic. Having tracks in a DB is not ideal for parsing. It could potentially be accelerated by implementing a dedicated tokenizer, which would require partial chunk requests from the binary storage. Not guaranteed to be faster.

lennyanders commented 3 years ago

I tried it like this, and it didn't get faster:

import { parseFromTokenizer } from 'music-metadata'
import { fromBuffer } from 'strtok3'

const fileBuffer = await new Response(file).arrayBuffer()
const fileUint8 = new Uint8Array(fileBuffer)
const tokenizer = await fromBuffer(fileUint8, {
  mimeType: file.type,
  path: file.name,
  size: file.size,
})
const tags = await parseFromTokenizer(tokenizer, { duration: true })

Also: the tracks aren't in the database, only FileSystemFileHandle's are stored in the database (at least in chromium browsers)

Borewit commented 3 years ago

What I meant was utilizing partial / random access via the tokenizer, aiming to prevent to read the entire file. But it looks like this is the only way provided FileSystemFileHandle. You would think that get's you in trouble playing an 8 GB video file.

Borewit commented 3 years ago

This is something what could speed up reading: Use Blob.stream() instead of Blob.arrayBuffer() (File is subclass from Blob).

I actually have not done that myself in music-metadata-browser .parseBlob().

Update: improvement: Borewit/music-metadata-browser#597

Be aware that music-metadata has no support for web streams (ReadableStream) yet, only music-metadata-browser has it with parseReadableStream().

lennyanders commented 3 years ago

Sadly, it doesn't work for me. Somewhere, the code calls Stream.call(this), but the Stream object doesn't exist. Do I need a polyfill for that?

Another problem I have with music-metadata-browser that it somewhere in the code-base searches for Buffer on global. I can get around that by letting my tooling replace every global with globalThis but it would be great if I haven't had to do this.

Apart from that, I absolutely love your music parser.

Borewit commented 3 years ago

Sadly, it doesn't work for me. Somewhere, the code calls Stream.call(this), but the Stream object doesn't exist. Do I need a polyfill for that?

The web stream returned by Blob.stream() must be converted to a Node.js stream. I wrote my own module for that: readable-web-to-node-stream, at the time I could not find any other way. So this is exactly what parseReadableStream() does. This actually includes most of the magic music-metadata-browser does on top of music-metadata.

Since very recent web stream are supported in Node.js, so this will become more straight forward in the future.

Another problem I have with music-metadata-browser that it somewhere in the code-base searches for Buffer on global. I can get around that by letting my tooling replace every global with globalThis but it would be great if I haven't had to do this.

I am not sure how I can improve that. The aim of music-metadata-browser is to take as much pain away for the user, to use music-metadata in a WebPack like configuration. Yet the way is being dealt with polyfilling Node.js dependency is slowly changing, from automatic polyfilling, to ES Modules. I will look into ESM.

minht11 / local-music-pwa

Implement music metadata for tag/metadata parsing #6