margoliashlab / bark

A timeseries data format
GNU General Public License v2.0
1 stars 2 forks source link

Bark

Bark is:

  1. a standard for time-series data, and a python implementation for reading and writing bark formatted data.
  2. A python module for signal processing on larger-than-memory data sets.
  3. A set of command-line tools for building data processing pipelines.

Build Status

Version: 0.2

The Bark philosophy

  1. minimal specification and implementation
  2. simple file formats
  3. small, chainable utilities

By emphasizing filesystem directories, plain text files and a common binary array format, Bark makes it easy to use both large external projects and simple command-line utilities.

Bark's small specification and Python implementation are easy to use in custom tools.

These tools can be chained together using GNU Make to build data pipelines.

Why use Bark?

Inspired by ARF, Bark uses a hierarchy of common data storage formats. The advantages of this approach are:

The elements of Bark

Bark trees are made from the following elements:

This repository contains:

Installation

The python interface runs under Python 3.5 through 3.8. Installation with Conda is recommended.

git clone https://github.com/margoliashlab/bark
cd bark

pip install -r requirements.txt
pip install .

# optional tests
pytest -v

These installation instructions cover the main bark library and almost all of the conversion scripts and command-line data manipulation tools. Exceptions are noted below.

The requirements file omits dependencies for a few optional graphical tools included in this repository. Their additional requirements are as follows, and are not shared across them. If you don't intend to use one, you can ignore its requirements.

Finally, Sox is also extremely useful for working with audio data. One conversion routine, dat-to-audio, is a wrapper around Sox, and thus requires it to be installed.

Shell Commands

Every command has help accessible with the flag -h (e.g. bark-entry -h).

Transformations

There are many external tools for processing CSV files, including pandas and csvkit.

Visualizations

Conversion

Control Flow

bark-extra

More tools with less generality can be found in the bark-extra repository.

Python interface

import bark
root = bark.read_root("black5")
root.entries.keys()
# dict_keys(['2016-01-18', '2016-01-19', '2016-01-17', '2016-01-20', '2016-01-21'])
entry = root['2016-01-18']
entry.attrs
# {'bird': 'black5',
# 'experiment': 'hvc_syrinx_screwdrive',
# 'experimenter': 'kjbrown',
# 'timestamp': '2017-02-27T11:03:21.095541-06:00',
# 'uuid': 'a53d24af-ac13-4eb3-b5f4-0600a14bb7b0'}
entry.datasets.keys()
# dict_keys(['enr_emg.dat', 'enr_mic.dat', 'enr_emg_times.csv', 'enr_hvc.dat', 'raw.label', 'enr_hvc_times.csv', 'enr.label'])
hvc = entry['enr_hvc.dat']
hvc.data.shape
# (7604129, 3)

The Stream object in the bark.stream module exposes a powerful data pipeline design system for sampled data.

Example usage: Example usage

Pipelines with GNU Make

Some links to get started with Make:

Related projects

Authors

Dan Meliza created ARF. Bark was was written by Kyler Brown so he could finish his damn thesis in 2017. Graham Fetterman also made significant contributions.