Paidiverpy is a Python package designed to create pipelines for preprocessing image data for biodiversity analysis.
Note: This package is still in active development, and frequent updates and changes are expected. The API and features may evolve as we continue improving it.
Comprehensive documentation is forthcoming.
You can install paidiverpy
locally or on a notebook server such as JASMIN or the NOC Data Science Platform (DSP). The following steps are applicable to both environments, but steps 2 and 3 are required if you are using a notebook server.
Clone the repository:
# ssh
git clone git@github.com:paidiver/paidiverpy.git
# https
# git clone https://github.com/paidiver/paidiverpy.git
cd paidiverpy
(Optional) Create a Python virtual environment to manage dependencies separately from other projects. For example, using mamba
:
mamba init
# Command to restart the terminal. This command may not be necessary if mamba init has already been successfully run before
exec bash
mamba env create -f environment.yml
mamba activate Paidiverpy
(Optional) For JASMIN or DSP users, you also need to install the environment in the Jupyter IPython kernel. Execute the following command:
python -m ipykernel install --user --name Paidiverpy
Install the paidiverpy package:
Finally, you can install the paidiverpy package:
pip install -e .
First, create a configuration file. Example configuration files for processing the sample datasets are available in the example/config
directory. You can use these files to test the example notebooks described in the Usage section. Note that running the examples will automatically download the sample data.
The configuration file should follow the JSON schema described in the configuration file schema. An online tool to validate configuration files is available here.
To use this package, you may need a metadata file, which can be an IFDO.json file (following the IFDO standard) or a CSV file. For CSV files, ensure the filename
column uses one of the following headers: ['image-filename', 'filename', 'file_name', 'FileName', 'File Name']
.
Other columns like datetime, latitude, and longitude should follow these conventions:
['image-datetime', 'datetime', 'date_time', 'DateTime', 'Datetime']
['image-latitude', 'lat', 'latitude_deg', 'latitude', 'Latitude', 'Latitude_deg', 'Lat']
['image-longitude', 'lon', 'longitude_deg', 'longitude', 'Longitude', 'Longitude_deg', 'Lon']
Examples of CSV and IFDO metadata files are in the example/metadata
directory.
The package is organized into multiple layers:
The Paidiverpy
class serves as the main container for image processing functions. It manages several subclasses for specific processing tasks: OpenLayer
, ConvertLayer
, PositionLayer
, ResampleLayer
, and ColorLayer
.
Supporting classes include:
Configuration
: Parses and manages configuration files.Metadata
: Handles metadata.ImagesLayer
: Stores outputs from each image processing step.The Pipeline
class integrates all processing steps defined in the configuration file.
While comprehensive documentation is forthcoming, you can explore various use cases through sample notebooks in the examples/example_notebooks
directory:
If you'd like to manually download example data for testing, you can use the following command:
from paidiverpy import data
data.load(DATASET_NAME)
Available datasets:
Example data will be automatically downloaded when running the example notebooks.
Pipelines can be executed via command-line arguments. For example:
paidiverpy -c examples/config_files/config_simple.yaml
This runs the pipeline according to the configuration file, saving output images to the directory defined in the output_path
.
You can also run Paidiverpy using Docker. You can either build the container locally or pull it from Docker Hub.
Build the container locally:
git clone git@github.com:paidiver/paidiverpy.git
cd paidiverpy
docker build -t paidiverpy .
Pull the image from Docker Hub:
docker pull soutobias/paidiverpy:latest
Run the container with:
docker run --rm \
-v <OUTPUT_PATH>:/app/output/ \
-v <FULL_PATH_OF_CONFIGURATION_FILE_WITHOUT_FILENAME>:/app/config_files \
soutobias/paidiverpy:latest \
paidiverpy -c /app/examples/config_files/<CONFIGURATION_FILE_FILENAME>
In this command:
<OUTPUT_PATH>
: The output path defined in your configuration file.<FULL_PATH_OF_CONFIGURATION_FILE_WITHOUT_FILENAME>
: The local directory of your configuration file.<CONFIGURATION_FILE_FILENAME>
: The name of the configuration file.The output images will be saved to the specified output_path
.