tidalcycles / Clean-Samples

Like Dirt-Samples, but cleaned up
GNU General Public License v3.0
44 stars 4 forks source link

metadata fields #12

Open charlieroberts opened 3 years ago

charlieroberts commented 3 years ago

I have a proof-of-concept node.js script that parses the clean-samples quark and then lets you select which repos you'd like to download. Having the download size of each repo would be nice to help users make informed choices about what they're grabbing.

Which made me think that perhaps we should be adding more metadata in general, and perhaps most of this could be automatically added by the Python script so that it wouldn't be a burden on users adding sample banks. I would suggest as a possible starting point:

  1. Filesize
  2. Number of channels
  3. Sample Rate
  4. Bit depth
  5. Duration

This might enable more selective download scripts in the future e.g. "get all 16-bit mono samples that are under .5 seconds in duration from the repos by yaxu ". Is there a reason not to add more metadata?

charlieroberts commented 3 years ago

In a similar vein, perhaps the "dependencies" field of the top-level quark should contain a bit more information about each repo? Or is the expectation that the typical use case will be to download all sample banks? As the collection gets larger (which will be great!) this could become problematic...

yaxu commented 3 years ago

As I mused before, we're conflating different issues - how to replace the monolithic Dirt-Samples packs to welcome new users to tidal and friends, and how to browse and share samples more generally. This repo is really about the first issue, but the second issue is more interesting and I'm very happy to jump into that here regardless!

Yes no real reason not to add this metadata and more, e.g. cuepoints (e.g. marking onsets in a breakbeat), spectral centroid, zero crossings etc would also be very useful to have. I think it would be good to take a moment to find an existing standard to base this on though - do you know of one? When I look around for audio file metadata formats they seem more concerned with whether the sound file is 'christian rock' or 'experimental' than objective dsp measurements. I'm sure the MIR community will have something though..

Things like 'number of channels' are a bit different from 'spectral centroid' because the former will be a properties of a sound file, closer to metadata rather than data. But still useful to have this in the metadata file.. again it would be good to look at an existing standard as channel count does not tell us enough for some cases e.g. 8 channels could be octophonic in a ring or corners of a cube or 7.1.

I think you're pointing out a flaws in the quark system really. The quark database is just a list of github links, and so when an end user clicks refresh, github gets polled for everything in the list. This takes a long time. No information is shown about a quark until you install it. It works OK for installing samples with superdirt, but doesn't meet your purposes charlie, of browsing samples ahead of installation.

So some design points/questions:

  1. The metadata files should be useful independently from the samples they refer to. That is, the metadata should include a stable URL to the location of the samples, like https://github.com/tidalcycles/sounds-repetition. Then the sound filename repetition/a.wav would be relative to that.
  2. The metadata files could then be cached and served from a single location, either into the same folder or into the same file. This could then serve as an index that is updated periodically (daily? hourly?)
  3. Should the samples then be cached and served centrally as well? Unsure about that.. Maybe not.
  4. Where do we do audio analysis? The python script is intended to be usable, but also as a reference implementation as a way to define the format and not the only interface for it. There are python libraries for this too but with supercollider to hand and used by many live coding systems it probably make senses to start with doing audio analysis there to add extra metadata to files created with the python script. Regardless, getting python to read basic audio file attributes like number of channels is an easy win, I'm sure a nice library exists for that.
telephon commented 3 years ago

No information is shown about a quark until you install it.

What about having one single TidalSamplesMetaData Quark? That could also have the URLs for the samples.

telephon commented 3 years ago
4\. Where do we do audio analysis?

There is a Quark : https://github.com/musikinformatik/SoundFileAnalysis/ You can write your analysis functions in supercollider, and also make it reproducible, or add analysis on the fly.

charlieroberts commented 3 years ago

Agreed there are multiple issues being conflated here. Maybe the quark could be auto-generated into its own repo, and this repo could be dedicated to more metadata? Or maybe it doesn't really matter, and all this text is light enough that it's not a concern.

I was trying to look through the AudioCommons work, but it's hard to find a "standard" specification in their website or a quick look through the related publications. But here's some of the AudioCommons analysis that Freesound uses.

Including transients for beat-slicing sounds fantastic, and more detailed info on spatialization also would be great, although I imagine this could get very specific (six-channel diamond vs six-channel cube etc.) Perhaps something like SpatDif could work or at least serve as a model? Could also be overkill.

charlieroberts commented 3 years ago

what the hell is a six-channel cube? forgive my addled maths :)

charlieroberts commented 3 years ago

OK, dug a bit more through some of the AudioCommons "deliverables" (as opposed to the corresponding peer-reviewed publications) and it looks like the analysis that Freesound includes is most of the AudioCommons "spec". Here's a document that goes into detail on how each of these is measured.

Perhaps we stick with these for analysis purposes, but extend with more descriptors that are appropriate to our use case?

yaxu commented 3 years ago

Struggling to work out what the audio commons is.. Do you mean use their ontology, or their analysis tools as well?

charlieroberts commented 3 years ago

I guess the specific part of their ontology that relates to analysis. The entire ontology is quite large. But maybe there are other parts also worth adopting.

charlieroberts commented 3 years ago

Paging @axambo as she was involved in the AudioCommons project and might have some thoughts...

yaxu commented 3 years ago

Hi @axambo! I guess I'm unsure how this relates to freesound.. Does freesound automatically do this analysis? Or is this something we can do with supercollider and then upload/update metadata in freesound?

charlieroberts commented 3 years ago

My impression is that Freesound automatically does this analysis. So, it would potentially be a duplication of "work" (computation time / energy) if we plan to try and upload everything to Freesound (maybe we should formalize how this would work if we're sure we want to do it?). And, given that we might be using the quark @telephon mentioned, the results might be slightly different from those Freesound provides.

But it does provide a common set of analysis descriptors to use a template. I'm not attached to them; even in this brief discussion @yaxu has already pointed out two properties that don't seem to be addressed (transients / slices and spatial configuration) that seem like they'd have much more value for our typical use cases than a boominess coefficient. Although making a performance of only boomy sounds seems like a fun challenge :)

yaxu commented 3 years ago

I see, it took a bit of digging in the api (the docs are for an old version of the api) but yep the analysis is all there: https://freesound.org/apiv2/sounds/565580/analysis/

I guess it takes a little while to run after an upoad.

Do these tags actually relate to the audiocommons stuff? There's no ac: prefix.

charlieroberts commented 3 years ago

Ack, yes, sorry, I misunderstood how the analysis was stored in the database. The AudioCommons descriptors are filters that you can use to search Freesound... I'm not sure how the analysis you pointed to is used behind the scenes to enable these. I guess once you have all that low-level analysis it's reasonably fast to calculate the higher-level descriptions? Seems like it's actually a one-to-one correspondence in many cases (albeit with different names).

I think we're more interested in the high-level AC descriptors than the low-level analysis anyways.