The open-source Continuous Machine Learning Platform
Build ML pipelines with only Python, run on your laptop, or in the cloud.
Sematic is an open-source ML development platform. It
lets ML Engineers and Data Scientists write arbitrarily complex end-to-end
pipelines with simple Python and execute them on their local machine, in a cloud
VM, or on a Kubernetes cluster to leverage cloud resources.
Sematic is based on learnings gathered at top self-driving car companies. It
enables chaining data processing jobs (e.g. Apache Spark) with model training
(e.g. PyTorch, Tensorflow), or any other arbitrary Python business logic into
type-safe, traceable, reproducible end-to-end pipelines that can be monitored
and visualized in a modern web dashboard.
Read our documentation and join our Discord
channel.
Why Sematic
- Easy onboarding – no deployment or infrastructure needed to get started,
simply install Sematic locally and start exploring.
- Local-to-cloud parity – run the same code on your local laptop and on your
Kubernetes cluster.
- End-to-end traceability – all pipeline artifacts are persisted, tracked,
and visualizable in a web dashboard.
- Access heterogeneous compute – customize required resources for each
pipeline step to optimize your performance and cloud footprint (CPUs, memory,
GPUs, Spark cluster, etc.)
- Reproducibility – rerun your pipelines from the UI with guaranteed
reproducibility of results
Getting Started
To get started locally, simply install Sematic in your Python environment:
$ pip install sematic
Start the local web dashboard:
$ sematic start
Run an example pipeline:
$ sematic run examples/mnist/pytorch
Create a new boilerplate project:
$ sematic new my_new_project
Or from an existing example:
$ sematic new my_new_project --from examples/mnist/pytorch
Then run it with:
$ python3 -m my_new_project
To deploy Sematic to Kubernetes and leverage cloud resources, see our
documentation.
Features
- Lightweight Python SDK – define arbitrarily complex end-to-end pipelines
- Pipeline nesting – arbitrarily nest pipelines into larger pipelines
- Dynamic graphs – Python-defined graphs allow for iterations, conditional
branching, etc.
- Lineage tracking – all inputs and outputs of all steps are persisted and
tracked
- Runtime type-checking – fail early with run-time type checking
- Web dashboard – Monitor, track, and visualize pipelines in a modern web UI
- Artifact visualization – visualize all inputs and outputs of all steps in
the web dashboard
- Local execution – run pipelines on your local machine without any
deployment necessary
- Cloud orchestration – run pipelines on Kubernetes to access GPUs and other
cloud resources
- Heterogeneous compute resources – run different steps on different
machines (e.g. CPUs, memory, GPU, Spark, etc.)
- Helm chart deployment – install Sematic on your Kubernetes cluster
- Pipeline reruns – rerun pipelines from the UI from an arbitrary point in
the graph
- Step caching – cache expensive pipeline steps for faster iteration
- Step retry – recover from transient failures with step retries
- Metadata and collaboration – Tags, source code visualization, docstrings,
notes, etc.
- Numerous integrations – See below
Integrations
- Apache Spark – on-demand in-cluster Spark cluster
- Ray – on-demand Ray in-cluster Ray resources
- Snowflake – easily query your data warehouse (other warehouses supported
too)
- Plotly, Matplotlib – visualize plot artifacts in the web dashboard
- Pandas – visualize dataframe artifacts in the dashboard
- Grafana – embed Grafana panels in the web dashboard
- Bazel – integrate with your Bazel build system
- Helm chart – deploy to Kubernetes with our Helm chart
- Git – track git information in the web dashboard
Community and resources
Learn more about Sematic and get in touch with the following resources:
Contribute!
To contribute to Sematic, check out open issues tagged "good first
issue",
and get in touch with us on Discord.
You can find instructions on how to get your development environment set up
in our developer docs. If you'd like to add
an example, you may also find
this guide
helpful.