janclemenslab / das

Deep Audio Segmenter
http://janclemenslab.org/das/
Apache License 2.0
28 stars 10 forks source link

Input data formats and data across multiple files when using pre-existing annotations #33

Closed dstanner closed 2 years ago

dstanner commented 2 years ago

Two questions:

  1. The information in thee data formats page seems to assume that the data will be contained within a single file. What if data is spread across multiple input wav files each with a separate set of annotations? One possibility might be to concatenate the wav files into a single numpy array, but this doesn't seem satisfying as it could lead to edge artifacts where the signals are appended. Are there other ways to manage inputs that are distributed across multiple recordings?

  2. I'm a bit confused by the information describing audio/annotations/song definitions (at the top of the page) and the data structure used for training (at the bottom). How are the data and annotation files related to the data structures for training? It appears that the data structures for training are assuming that the data is already stored in memory (e.g., the wav files are converted to numpy arrays and the labels are converted to a series of labels for each time point). Is that correct? It's not clear how one is supposed to go from the files described at the top to the data structures described at the bottom.

    If the data structure assumes all of the data is a dictionary in memory, it seems that the top part about the annotation files can just be skipped if we are converting our labels into time series ourselves, right? Just create our own pre-processing and load the data into memory?

    If so, we'd still need to figure out a way to deal with multiple input files, since I'm presuming that data['train']['x'] is a single array with the entire series of audio values for all of the training data, right?

postpop commented 2 years ago

Hi Darren,

In brief: The recommended way for going about this is to put a bunch of wav files with associated annotation files (filename1.wav + filename1_annotations.csv, ...) into a folder and then use the GUI via the "DAS/Make dataset for training" menu to prepare the data for training. This will split the data into train/val/test sets, concatenate files, create the one-hot-encoded training targets, and result in the data structure described here.

Details:

  1. The above procedure will merge data across files. Yes, that can lead to potential weirdness around the edges, but we never found this to be an issue in practice since the resulting artifacts have very different stats so the network easily ignores these. If this becomes an issue for your data then I'm happy to chat more and find a solution.
  2. The GUI will create that structure for you and also takes care of splitting the data etc. But you are correct, in essence, the data structure used for training is just npy files for the audio and the labels plus some metadata. The top part of the annotation files is to make it easier for users to prepare their own data as just a bunch of wav and csv files and then use the GUI for making the actual dataset. This is the recommended way for all users to ensure that everything is in the right format but you absolutely can make the dataset yourself:
    • Parent directory should end in .npy for DAS to properly recognize the format.
    • There should be subdirectories train/test/val for the different data splits.
    • Each of the subdirectories should contain a x.npy file with the audio samples (concatenated from multiple individual files) and ay.npy with the the one-hot-encoded training targets for each sample. The targets have format [sample, number of classes] - you need to include a noise/no signal class at index 0.
    • The parent directory should also contain an attrs.npy file which is simply a python dictionary saved using np.save with the following keys:
    • samplerate: of x and y in Hz as a float.
    • class_names: name for each class as a list of strings, first class name should be noise.
    • class_types: type of each class as a list of strings, same length as class_names. Allowed string values are event and segment (in most cases, segment is what you want).

Let me know if you have any more questions and if you run into issues.