How to organise sample sets

yaxu commented 3 years ago

Should we use submodules, so people can curate sample sets in their own users/orgs? https://git-scm.com/book/en/v2/Git-Tools-Submodules

Or shall we make a monolithic set of samples here, with more centralised control to e.g. keep the metadata up-to-date?

Or something in between - separate submodules, but all forked under the same organisation for maintenance?

(longer term, the organisation shouldn't be tidalcycles, maybe toplap or some other umbrella)

telephon commented 3 years ago

I tend to have problems with submodules, but this is probably just me. But it is good to have a central place where communication happens, because that may be hard to track. Since it is a quark, you could theoretically also just install several of them. Just my cent.

yaxu commented 3 years ago

That makes me think about how quarks are currently an incomplete interface for git. If the quark contains metadata files that could be updated locally (e.g. by running some sound analysis script), that will cause problems for the quark system - it wouldn't be able to update that folder any more.

Anyway I think best to keep things as simple as possible. SuperDirt could depend on Clean-Samples, which could have some utility scripts but as you suggest could then depend on a collection of other quarks with the actual samples in.

telephon commented 3 years ago

That makes me think about how quarks are currently an incomplete interface for git.

What would have to change in order to complete it?

yaxu commented 3 years ago

The recent change to the OSC path from /play2 to /dirt/play meant that superdirt had to be upgraded with tidal for 1.7. Many people couldn't manage this, I think mostly because they had made local changes to the quark folder but there seemed to be something else going on. It could only be solved by getting them to paste git commands that they didn't understand into the shell.

telephon commented 3 years ago

Hm, hard to know what to do about it. One would need to test with a broken repository. It would be good to fix this!

One thing also: Many people don't have git on their computer, they just copy the folder from the web.

yaxu commented 3 years ago

git is treated as a dependency in the tidal install process. development tools are currently needed for compiling haskell libraries anyway so this isn't a big problem. One day a standalone superdirt install would be nice, with tidal as a relocatable binary too.. Although that makes it harder to transition to working on your own synths and so on.

yaxu commented 3 years ago

I think one thing that beginners value is being able to start again from scratch. An easy way of wiping quarks for a reinstall could be a fairly easy fix. It could be a git stash

telephon commented 3 years ago

It could be a git stash

Hm, this might then "destroy" changes that a user has made.

charlieroberts commented 3 years ago

I stopped using git submodules a long time ago because I also had problems with them (like Julian). It's been long enough that I've forgotten what the problems were though :) Maybe they're not that bad for the typical use case of a user pulling them in but not modifying them / submitting changes.

That said I like the middle-ground approach of having a "core" set of dirt samples here and then some strategy (possibly git submodules) for federating other collections of samples that use the same metadata format.

cleary commented 3 years ago

I'd like to use git-submodules, but here are the issues I anticipate having with them:

You'll need to dictate a standard design for the submodule repos - eg with flbass samples live in the root dir - this is easy to use as a submodule, because you can choose how you want to name the superdirt/tidal shortcut directory that those samples will live in

clean-dirt-samples
|_ flbass (submodule)
  |_ *.wav

With dirt-jv1080 the samples already live in named sub-folders, this is probably not usable as a submodule

clean-dirt-samples
|_ jv1080 (submodule)
  |_ jvgabba
    |_ *.wav

Which will not work in superdirt I'm guessing, unless you reference all samples as jv1080? It will likely be problematic on the soundbrowsers of atom/vscode too

As a side note, I chose to create my sample repos this way with the intent of using submodules to aggregate them all as a set later.

submodules mark a commit hash as their state within your repository, and require explicit updating to a new hash of the repositories they follow - I would have assumed it would be trivial to automatically choose to track all changes on a submodule, but this is not the case (in my experience). Allegedly possible, have not successfully done it in multiple attempts.

I use submodules pretty heavily, have done so the past year in the ansible tidal installer (check the roles dir, they are intended to be modular), and would be happy to manage/support their use here if required.

I won't lie though, they can be a bit hairy at times.

cleary commented 3 years ago

I've added an example pull request for a submodule: https://github.com/tidalcycles/Clean-Samples/pull/4

(no metadata attached to those samples yet though)

yaxu commented 3 years ago

1. You'll need to dictate a standard design for the submodule repos

Hmm yes. I think as long as the metadata file is in a standard place, or linked to elsewhere, we can have some flexibility in organising sample packs by specifying locations in that file. There's already a pack_folder parameter to the python script, although now I realise that this info doesn't make it into the metadata.. But that could just be . for the top level folder.

There is then question about how to handle repos with multiple 'packs' in like that jv1080 one though .. https://github.com/yaxu/dirt-jv1080 One top level metadata file? Or one file per folder, in the top level? Or one inside each folder? Then is there a top-level list? Hm.

Maybe one metadata file could 'include' another.

yaxu commented 3 years ago

Thanks @cleary, I merged the PR, although can't seem to get anything other than an empty folder for samples-flbass. Doubtless I'm doing something wrong.. For supercollider use at least we might be better off using quark depedencies.

dktr0 commented 3 years ago

I've been thinking about this a lot because of similar emerging systems in Estuary. I think you definitely want metadata in each folder that is referred to/included by other metadata files elsewhere/higher up. In this way, it makes it most straightforward for people to point to parts of sample libraries rather than whole ones.

cleary commented 3 years ago

Thanks @cleary, I merged the PR, although can't seem to get anything other than an empty folder for samples-flbass. Doubtless I'm doing something wrong.. For supercollider use at least we might be better off using quark depedencies.

@yaxu this is the first hurdle with submodules - I mentioned in the pr but quite a way down:

the clone command needs to change:

git clone --recurse-submodules https://github.com/tidalcycles/Clean-Samples.git

yaxu commented 3 years ago

@cleary I already had it cloned, the answer was git submodule update --init --recursive. It seems newer versions of git do this automatically.

@dktr0 Yes agreed. So sample packs with sample banks. I think all in one format, but with a way to include subfiles, encouraging people to organise it as you say, as standard. Allowing the top level to have wildcards would be nice.

telephon commented 3 years ago

Please keep it simple. Formats can be extended, if there is a version header. Let's start with a simple, but not too simple case and see.

yaxu commented 3 years ago

Ok just one file per sample bank then. We can worry about structuring them into sample packs later

capital-G commented 3 years ago

Leaving my 2 cents here - from my coding experience organizing binary files via git is an anti pattern. There is git-lfs but this introduces a new dependency on the client side and its usecase is more aimed towards versioning of binary files - but it does not work so well for the distribution of binary files which is the aim here.

Taking a look at e.g. pytorch which also needs to distribute binary files to users via a so called model zoo reveals that they simply manage the files via a S3 bucket and download the files to a local cache directory when one wants to use the model, see e.g. the source code for this pre trained neural network which gets distributed as binary https://pytorch.org/vision/stable/_modules/torchvision/models/alexnet.html#alexnet Going this path would allow to quickly share and install new sample packs by sharing a URL and this repo would become more like a curated list.

The problem here is that SC does yet not support any web protocols (see https://github.com/supercollider/supercollider/issues/5310) and one would need to use system calls (like curl) to solve this, introducing a new dependency and making it sure somehow buggy for windows installs.

It also seems that FOSS projects can apply for a credit at AWS. On the other hand: One could abuse the github release management for distribution of sample packs if one does not want to rely on S3.

BTW - if you need any help with the python script: I am glad to provide any help.

yaxu commented 3 years ago

Hi @capital-G, thanks for the thoughts.

What are the bad things that could happen if we use git for this? In practice haven't really seen any issues with this approach and the currently monolithic Dirt-Samples repo.

I don't know too much about AWS, but I feel like amazon shouldn't be a dependency for free/open source/open access.

Other alternatives could be Freesound, as an academic project I've always worried that it would disappear, so I just tried to look for info about it's sustainability and was really happy to see annual sustainability reports - amazing https://blog.freesound.org/?p=1316 So that looks like a good option.

Another alternative is uploading things to archive.org.

In what sense would using github releases being abuse, and can SC access those?

I didn't know sc didn't support web protocols, that's good to know. Doesn't that limit things to git for now?

Once sc does support web protocols in-the-wild, or we get tidal to manage that side of things, we could simply allow urls to be specified for paths in the metadata file. Then the provider can decide how to host them.

Help with the python script would be very welcome once I've finished sketching it out a bit more, thanks. I'm not really a python programmer so it is probably not idiomatic.

charlieroberts commented 3 years ago

Great to see Freesound doing so well(!), thanks for that link @yaxu. Having everything available on Freesound would make integration with gibber already finished, so from that standpoint I'm all for it.

But I suppose I imagined that binary versioning would be a part of what could make this repo useful, e.g. supporting editing / cleaning of samples, replacing samples in the event of a user error during submission or licensing problems. For people comfortable with the command line git-lfs seems pretty simple, and for those who aren't git-lfs is installed with the Github Desktop client app. OTOH I guess the desktop app wouldn't integrate into a all-in-one installation script easily...

yaxu commented 3 years ago

I do like the idea of having a simple folder with some samples and a metadata file. You can zip it and email it, share it on github, share it on a floppy disk or whatever, nice and easy. We can make an easy utility for uploading to/downloading from freesound as well.. Maintaining freesound links in the metadata in either case.

I can't see any reason not to store wav files in git/github. All files are binary files. The "anti-pattern" seems only to be with files that are large and change a lot and often. Sound samples are generally pretty small and don't change very often. Plus you can always wipe history if you want.

dktr0 commented 3 years ago

Chiming in that with Estuary we have been using github to manage multiple repositories of sound files, maintained by different people and brought together in the platform, for a few years now, and it's worked well. As we move (very soon! It's what I'm working on right now...) to on-the-fly-ish sample banks, we expect to use github's gh-pages mechanism to provide the samples in a CORS-compliant way.

yaxu commented 3 years ago

Jolly good. I think coming up with a really simple, tech agnostic, extendable way to share samples is the way forward. I think like time synch, it might be more difficult than it seems though. All the dirts assume a flat folder-of-folders-of-samples structure, indexed by folder name-plus-number, where the number is the position sorted by filename. ixilang and I think foxdot map from single characters to sounds, for even more brevity. Other systems probably like to refer to sounds by the name of the wav file.

I'm thinking it would be nice to tag each sound with an 'shortname', so that then sounds can be triggered like drum:snare as well as drum:1. Probably this work would be done on the tidal side so that n ("snare" - 1) would still work to get a kick.

yaxu commented 3 years ago

After consideration I think how to organise sample sets is out of scope for this. Lets go for a really simple way to add metadata to a folder with samples in. What storage mechanism is used e.g. github submodules etc should be up to negotiation later. For superdirt simple quarks make the most sense.

tidalcycles / Clean-Samples

How to organise sample sets #3