dispel4py is a free and open-source Python library for describing abstract stream-based workflows for distributed data-intensive applications. It enables users to focus on their scientific methods, avoiding distracting details and retaining flexibility over the computing infrastructure they use. It delivers mappings to diverse computing infrastructures, including cloud technologies, HPC architectures and specialised data-intensive machines, to move seamlessly into production with large-scale data loads. The dispel4py system maps workflows dynamically onto multiple enactment systems, and supports parallel processing on distributed memory systems with MPI and shared memory systems with multiprocessing, without users having to modify their workflows.
dispel4py has been tested with Python 2.7.6, 2.7.5, 2.7.2, 2.6.6 and Python 3.4.3, 3.6, 3.7, 3.10.
The dependencies required for running dispel4py are listed in the requirements.txt file. To install them, please run:
pip install -r requirements.txt
You will also need the following installed on your system:
In order to install dispel4py on your system:
redis
and the mpi4py
Python package are installed on your systempip install -r requirements.txt
python setup.py install
dispel4py <mapping name> <workflow file> <args>
, ORpython -m dispel4py.new.processor <mapping module> <workflow module> <args>
For installing for development with a conda environment, please run the following commands in your terminal.
conda create --name py310 python=3.10
conda activate py310
pip uninstall py310
git clone https://github.com/dispel4py2-0/dispel4py.git
cd dispel4py
pip install -r requirements.txt
python setup.py install
The Dockerfile in the dispel4py root directory builds a Debian Linux distribution and installs dispel4py and OpenMPI.
docker build . -t dare-dispel4py
Start a Docker container with the dispel4py image in interactive mode with a bash shell:
docker run -it dare-dispel4py /bin/bash
For the EPOS use cases obspy is included in a separate Dockerfile Dockerfile.seismo
:
docker build . -f Dockerfile.seismo -t dare-dispel4py-seismo
Some simple examples, intended for testing, are included in this repository. For more complex "real-world" examples for specific scientific domains, such as seismology, please see: https://github.com/rosafilgueira/dispel4py_workflows
python -m dispel4py.new.processor dispel4py.new.simple_process dispel4py.examples.graph_testing.word_count -i 10
python -m dispel4py.new.processor dispel4py.new.multi_process dispel4py.examples.graph_testing.word_count -n 5 -i 10
mpiexec -n 10 python -m dispel4py.new.processor dispel4py.new.mpi_process dispel4py.examples.graph_testing.word_count -i 20 -n 10
RDD:
python -m dispel4py.new.processor dispel4py.new.dynamic_redis dispel4py.examples.graph_testing.word_count -ri localhost -n 4 -i 10
Note: In another tab, we need to have REDIS working in background:
redis-server
python -m dispel4py.new.processor dispel4py.new.simple_process dispel4py.examples.graph_testing.pipeline_test -i 10
python -m dispel4py.new.processor dispel4py.new.multi_process dispel4py.examples.graph_testing.pipeline_test -n 5 -i 10
mpiexec -n 10 python -m dispel4py.new.processor dispel4py.new.mpi_process dispel4py.examples.graph_testing.pipeline_test -i 20 -n 10
twitter_sentiment
), run:dispel4py simple analysis_sentiment.py -d '{"read":[{"input":"Articles_cleaned.csv"}]}'
dispel4py multi analysis_sentiment.py -n 15 -d '{"read":[{"input":"Articles_cleaned.csv"}]}'
mpiexec -np 15 dispel4py mpi analysis_sentiment.py -d '{"read":[{"input":"Articles_cleaned.csv"}]}'
This project is using the black
package for automatic formatting of Python code. However, there is a lot of old code that may need to be reformatted manually.
For more info, see: https://github.com/psf/black
This project uses ruff
for code linting. See: https://docs.astral.sh/ruff/
Ruff rules are configured and documented in the pyproject.toml file.
Future contributors are encouraged to lint their code using ruff check .
before contributing and to help fix existing lint errors!
Some regression testing has been set up to compare the output of the current version of dispel4py with an older version. These tests currently fail, probably due to slightly different formatting and line order.