Pescador is a library for streaming (numerical) data, primarily for use in machine learning applications.
Pescador addresses the following use cases:
These use cases arise in the following common scenarios:
Say you have three data sources (A, B, C)
that you want to sample.
For example, each data source could contain all the examples of a particular category.
Pescador can dynamically interleave these sources to provide a randomized stream D <- (A, B, C)
.
The distribution over (A, B, C)
need not be uniform: you can specify any distribution you like!
Now, say you have 3000 data sources, each of which may contain a large number of samples. Maybe that's too much data to fit in RAM at once.
Pescador makes it easy to interleave these sources while maintaining a small working set
.
Not all sources are simultaneously active, but Pescador manages the working set so you don't have to.
If loading data incurs substantial latency (e.g., due to accessing on-disk storage or pre-processing), this can be a problem.
Pescador can seamlessly move data generation into a background process, so that your main thread can continue working.
Want to learn more? Read the docs!
Pescador can be installed from PyPI through pip
:
pip install pescador
or with conda
using the conda-forge
channel:
conda install -c conda-forge pescador