tidalcycles / Dirt-Samples

Set of samples used in Dirt
147 stars 43 forks source link

Rationalising / reorganising / cataloguing the samples #15

Closed jkbd closed 3 years ago

jkbd commented 5 years ago

I used rmlint --types=duplicates -p and found that there are 69 byte-identical duplicates among the files. Total 2,2 MiB in size. Some of them are even in the same folder, for example:

ls 'invaders/007_16.wav' # considered "original"
rm 'invaders/014_6.wav'
rm 'space/007_16.wav'
rm 'space/014_6.wav'

or

ls 'mash/0.wav'
rm 'mash/1.wav'

I suggest to delete them.

yaxu commented 5 years ago

Thanks @jkbd! Really these samples are just teh contents of my

Removing in a lot of cases would change things for people who have compositions with these samples. Perhaps not worth those ones for the sake of 2.2MB?

Full list below

    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/drum/000_drum1.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/gabba/001_1.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/glitch2/000_BD.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/glitch/000_BD.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/glitch2/001_CB.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/glitch/001_CB.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/glitch2/002_FX.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/glitch/002_FX.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/glitch2/003_HH.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/glitch/003_HH.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/glitch2/004_OH.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/glitch/004_OH.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/glitch/005_P1.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/glitch2/005_P1.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/glitch/006_P2.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/glitch2/006_P2.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/glitch/007_SN.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/glitch2/007_SN.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/gabba/002_2.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/industrial/001_02.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/house/000_BD.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/jazz/000_BD.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/house/003_HH.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/jazz/003_HH.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/noise/000_noise.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/noise2/007_7.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/blip/000_blipp01.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/sid/003_blipp01.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/blip/001_blipp02.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/sid/004_blipp02.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/invaders/000_0.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/space/000_0.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/invaders/001_1.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/space/001_1.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/invaders/002_11.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/space/002_11.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/invaders/004_13.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/space/004_13.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/invaders/005_14.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/invaders/012_4.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/space/005_14.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/space/012_4.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/invaders/011_3.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/space/011_3.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/invaders/006_15.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/invaders/013_5.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/space/006_15.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/space/013_5.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/invaders/007_16.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/invaders/014_6.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/space/007_16.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/space/014_6.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/invaders/008_17.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/invaders/015_7.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/space/008_17.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/space/015_7.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/dr_few/006_021.WAV'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/techno/000_0.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/gabba/003_3.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/industrial/004_05.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/techno/002_3.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/industrial/009_10.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/techno/003_4.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/can/003_12.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/techno/004_5.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/house/007_SN.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/jazz/007_SN.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/bd/BT0A0A7.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/world/bd.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/bass0/000_0.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/gabba/000_0.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/world/gabbakick.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/sn/ST0T0S3.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/world/sn.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/miniyeah/000_Sound0.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/yeah/000_Sound0.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/miniyeah/001_Sound11.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/yeah/002_Sound11.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/miniyeah/002_Sound23.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/yeah/012_Sound23.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/miniyeah/003_Sound36.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/yeah/024_Sound36.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/bassdm/012_BT3AADA.WAV'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/bd/BT3AADA.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/bassdm/022_BTAA0DA.WAV'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/bd/BTAA0DA.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/cr/RIDED4.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/cr/RIDED6.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/breaks125/015_sdstckbr.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/foo/015_sdstckbr.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/breaks125/016_bllstmp.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/foo/016_bllstmp.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/house/001_CB.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/jazz/001_CB.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/house/002_FX.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/jazz/002_FX.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/house/004_OH.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/jazz/004_OH.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/house/005_P1.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/jazz/005_P1.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/house/006_P2.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/jazz/006_P2.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/jungbass/008_junglesine2.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/jungbass/009_junglesine3.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/mash/0.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/mash/1.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/jungbass/005_jungasubdown.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/jungbass/010_mega_jungasubdown.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/invaders/003_12.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/space/003_12.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/invaders/009_18.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/space/009_18.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/invaders/010_2.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/space/010_2.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/invaders/017_9.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/space/017_9.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/stab/007_stab16.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/stab/008_stab17.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/sid/002_basd.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/techno/001_1.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/tacscan/003_1up.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/tacscan/004_credit.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/industrial/022_23.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/techno/006_7.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/invaders/016_8.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/space/016_8.wav'
    ls '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/techno/005_6.wav'
    rm '/home/alex/.local/share/SuperCollider/downloaded-quarks/Dirt-Samples/v/002_v_perc3.wav'
jkbd commented 5 years ago

My idea was to increase the quality of the library to new users. It's quite time consuming to get an overview. 69 files is a fraction of the total but small improvements add up. I think, it would be good to not break tutorials. But aren't real live coders starting compositions from scratch, anyway?

jkbd commented 5 years ago

It should be possible to provide a script, that replaces e.g. mash:1 with mash:0 in Tidal files. If such a script existed, would you vote for deleting duplicates?

yaxu commented 5 years ago

Instead I'd propose leaving this repo as-is but deprecating it, replaced with a total re-organisation, where each folder has a metadata file, with general description, provenance and licensing info.

yaxu commented 5 years ago

We should rescue the idea of 'pattern packs', here's a post I made to the old mailing list some time ago:

It'd be great to have a standard way of sharing 'pattern packs'. These
could contain:

1/ Sample banks
2/ and/or Supercollider synths
3/ Metadata about samples (and maybe synths) e.g. giving hints like
where the 'peak' of the sound lies so samples can have e.g. preverb in
them without sounding 'off'
4/ and/or Tidal patterns, which could be for varioius purposes, e.g.
distributing tracks as code+sounds (a bit like a .mod file), or for
documentation purposes (of the samples or of the patterns)
5/ Built in timed keypress data, so performances can be archived,
replayed and maybe edited as a collection of sounds and what was typed
(and when)

It would probably be best to have these as git repositories, which
could probably double up as supercollier 'quarks'. In the process of
moving to this format we could work towards a much cleaner set of
'default' sounds than we have now.

I think this should be as simple+easy as possible so there is not too
much friction involved with sharing sounds. Currently you just drop
samples into a folder to use them, the minimal case should not be much
more complicated than that..

For now I've just made a couple of repositories with samples, to start
sketching out what such a format would look like:
  https://github.com/yaxu/dirt-jv1080
  https://github.com/yaxu/dirt-impulse

What should this look like? What metadata would be useful? Is this a
good idea? + what could this be useful for? etc
yaxu commented 5 years ago

I remember at least one person went through all the sample folders describing each one, it would be nice to dig that up too!

yaxu commented 5 years ago

Lets not let perfect be the enemy of the good though. A good step forward would be making a new Clean-Samples repository, slowly moving the higher quality subset of sample banks there, with a README.md for each one. SuperDirt could load both folders for a time. @telephon is there an easy way to get superdirt to automatically load quarks full of samples that are present?

jkbd commented 5 years ago

For me this can be closed with the addition to the README. Reorganizing the sample sounds good but also is a greater undertaking. Does anybody know some hints from academia how to organize samples in a way that is stage-usable at the same time?

jwaldmann commented 4 years ago

I have assigned a student project to investigate similarity measures between samples, make a nice 2D map, etc.

telephon commented 4 years ago

Great – It would also be good to have breakpoint lists for wavesets.

yaxu commented 4 years ago

@telephon what would be a good metadata format for a set of samples - a supercollider dictionary, yaml, json, ?

jkbd commented 4 years ago

Turtle serialized RDF is flexible, extendable and kind of readable. An ontology for audio features exists: http://motools.sourceforge.net/doc/audio_features.html. But I never looked at it in detail. Probably it would be easier to use a script to convert RDF to whatever works best with Tidal and SuperDirt than to implement graph parsing in there.

telephon commented 4 years ago

In sclang, there are methods for yaml and json, either would work.

yaxu commented 3 years ago

Moved to https://github.com/tidalcycles/Clean-Samples/

Quodoso commented 3 years ago

I have assigned a student project to investigate similarity measures between samples, make a nice 2D map, etc.

I am the student assigned with said project.

To keep things short I address the general idea of the project, the underlying problem and the outcome of it.

Goal How to automatically reorganize samples according to ambiguous similarities.

Underlying problems Firstly, one major problem to address is how to categorize samples. More specific: Classify by instrument name or timbre, etc. The problem with classifying by instrument name with the Dirt-Samples is: Some are synthesized, some are recorded, some are loops, some are distorted/edited. The approach taken was to try to classify by timbre: everything which "sounds similar" should be in one group (i.e. a human whistle, a bird whistle, a tee pot sound, and perhaps a wind sound should be in one group).

Secondly, there should be a standardized ontology for audio samples which is not too specific and not too broad. (i.e. categorizing into 128 groups or into 3 groups are not desirable). I haven't found a data set that has a fitting audio ontology to refer to. My best findings are from the 2018 general purpose audio tagging challange. The ontology is a subset of Googles AudioSet Ontology. That subset is a compromise between instrument names (sound-producing names) and timbre names (i.e. Bass drum, Snare drum, Meow, Bark, Tearing, Fart, ...) and has 41 groups. Another advantage of it: It provides multiple pre-trained neural networks (Which I used to generate a "ground truth" for the Dirt-Samples).

Outcome

There is no nice readable 2D map of the 2000+ Dirt-Samples. What you see is the minimal span tree for all Dirt-Samples. There is no real advantage of it besides that the sum of all similarity distances is minimal. This does not mean, that all edges connect two most similar Dirt-Samples.

There are 3 ways of regrouping the Dirt-Samples:

  1. Regroup according to Cochlearai, the winner of the mentioned contest. Pros: Each group has a meaningful name. Cons: Some groups are not reliable, takes aged to calculate without a NVIDIA graphics card.
  2. Regroup according to own similarity subset (adjusted on groups of Cochlearai). Pros: Can adjust group size. Cons: No group names, Some groups are not reliable, depends on the labels generated by Cochlearai.
  3. Get n most similar samples to a given sample. Pros: Group name is meaningful. Results are kind of useful. Cons: depends on the labels generated by Cochlearai.

The project itself is, to put it nicely, worthy of proper software development improvements.

All in all: It is a way to reorganize the Dirt-Samples, but it is not near of being perfect. I would label it a nice try but not successful. I hope this is in some way helpful and not too wordy.