This repository contains a curated list of awesome data catalogs and observability platforms that help you discover, manage, and observe data in your organization.
Tool | Specification -Based | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observ- ability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|---|
Alation | β | βοΈ | β | βοΈ | β | β | βοΈ | β | β | β | β |
Amundsen | β | βοΈ | βοΈ | βοΈ | β | β | β | β | β | β | β |
Ataccama | β | βοΈ | β | βοΈ | β | β | βοΈ | β | β | β | β |
Atlan | β | βοΈ | β | βοΈ | β | β | βοΈ | β | β | βοΈ | βοΈ |
Atlas | β | βοΈ | β | βοΈ | β | β | β | β | β | β | β |
Azure DC | β | βοΈ | ? | βοΈ | β | β | ? | β | β | β | β |
CKAN | β | βοΈ | β | β | βοΈ | β | β | β | β | β | β |
Collibra | β | βοΈ | ? | βοΈ | β | β | ? | β | β | β | β |
DataGalaxy | β | βοΈ | βοΈ | βοΈ | β | β | β | βοΈ | βοΈ | ? | ? |
Databand | β | ? | ? | ? | β | ? | ? | ? | βοΈ | β | β |
Datafold | β | βοΈ | βοΈ | βοΈ | β | β | βοΈ | β | βοΈ | β | β |
DataHub | βοΈ details | βοΈ | βοΈ | βοΈ | β | β | β | β | β | β | β |
Google DC | β | βοΈ | β | βοΈ | β | β | ? | β | β | β | β |
Informatica | β | βοΈ | βοΈ | βοΈ | β | β | βοΈ | β | β | ? | β |
Magda | β | βοΈ | β | β | βοΈ | β | β | β | β | β | β |
Marquez | OpenLineage | βοΈ | β | βοΈ | ? | β | β | β | β | βοΈ | β |
Monte Carlo | β | βοΈ | β | βοΈ | β | β | βοΈ | β | βοΈ | β | β |
Select Star | β | βοΈ | βοΈ | βοΈ | βοΈ | β | β | βοΈ | β | βοΈ | βοΈ |
OpenDataDiscovery | ODD Specification | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | βοΈ |
OpenMetadata | JSON Schema | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | βοΈ | βοΈ |
Stemma | β | βοΈ | βοΈ | βοΈ | β | β | ? | βοΈ | β | β | β |
Talend | β | βοΈ | ? | βοΈ | β | β | βοΈ | β | β | β | β |
Meta#Grid | β | βοΈ | β | βοΈ | β | β | not yet | β | β | β | βοΈ |
Grai | Grai Schemas | βοΈ | β | βοΈ | β | βοΈ | βοΈ | β | β | βοΈ | βοΈ |
Definitions:
A popular open-source data catalog for metadata management and data discovery originated from Lyft. Created by Amundsen maintainers, Stemma provides a managed version of an enterprise data catalog, inspired by Amundsen.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | βοΈ | βοΈ | β | β | β | β | β | β | β |
DataHub is an open-source data catalog enabling data discovery, data observability and federated governance that originated from LinkedIn and is commercially offered by Acryl Data as a cloud-hosted SaaS offering.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | ||
---|---|---|---|---|---|---|---|---|---|---|
βοΈ details | βοΈ | βοΈ | βοΈ | β | β | β | β | β | β | β |
Marquez is an open-source data catalog for collection, aggregation, and visualization of a data ecosystemβs metadata originated from WeWork.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
OpenLineage | βοΈ | β | βοΈ | ? | β | β | β | β | βοΈ | β |
Apache Atlas is an open-source data catalog for metadata collection, governance, and data democratization.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | β | βοΈ | β | β | β | β | β | β | β |
CKAN is an open-source data catalog for data management, powering data portals for govenments and enterprises.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | β | β | βοΈ | β | β | β | β | β | β |
Magda is an open-source data catalog that features data discovery, metadata enrichment, and federation, focused on geodata.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | β | β | βοΈ | β | β | β | β | β | β |
First open-source data discovery and observability platform. ODD Platform is based on ODD Specification.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | βοΈ |
OpenMetadata is the all-in-one platform for data collaboration, discovery, governance, lineage, and quality that lets you focus on building and analyzing.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | βοΈ | βοΈ |
Meta#Grid is an open source data catalog for metadata management. It is designed to help small and large organizations create an inventory of their data silos and connect between different technologies. Through a multi-client system, with granular permissions system, Meta#Grid can be used in consulting companies (with diverse clients and projects) as well as in data mesh organizations. It grows with the requirements of the demand.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | β | βοΈ | β | β | not yet | β | β | β | βοΈ |
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
Grai Schemas | βοΈ | β | βοΈ | β | βοΈ | βοΈ | β | β | βοΈ | βοΈ |
Collibra is an enterprise data catalog that helps to discover and understand data that matters and drive impactful insights from it.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | ? | βοΈ | β | β | ? | β | β | β | β |
Informatica is an enterprise data catalog that provides AI-powered data discovery engine to scan and catalog data assets.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | βοΈ | βοΈ | β | β | βοΈ | β | β | ? | β |
Alation is a collaborative data catalog that helps companies to drive value and business impact from their data.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | β | βοΈ | β | β | βοΈ | β | β | β | β |
Atlan is a modern data catalog offering data discovery, data profiling, data quality, data lineage and data governance.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | β | βοΈ | β | β | βοΈ | β | β | βοΈ | βοΈ |
DataGalaxy is a modern data catalog offering data discovery, data profiling, data quality, data lineage and data governance.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | βοΈ | βοΈ | β | β | β | βοΈ | βοΈ | ? | ? |
Stemma is a fully managed data catalog powered by the open-source data catalog Amundsen that helps data teams have total trust in their data.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | βοΈ | βοΈ | β | β | ? | βοΈ | β | β | β |
Talend is a data catalog that helps enterprises power critical business descisions with trusted data.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | ? | βοΈ | β | β | βοΈ | β | β | β | β |
Select Star is an intelligent data discovery platform that automatically analyzes and documents your data. Select Star provides an easy to use data portal that everyone can use to find and understand data.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | βοΈ | βοΈ | βοΈ | β | β | βοΈ | β | βοΈ | βοΈ |
Google Cloud Data Catalog is a fully managed, scalable metadata management service in Google Cloud's Data Analytics family of products.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | β | βοΈ | β | β | ? | β | β | β | β |
Azure Data Catalog is a fully managed, enterprise-wide metadata catalog that makes data asset discovery straightforward.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | ? | βοΈ | β | β | ? | β | β | β | β |
DataKitchen's Open Source Data Observability Products are full featured with Apache 2.0 license. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | βοΈ | β | βοΈ |
Monte Carlo is a data observability tool that helps to increase trust in data by eliminating or preventing data downtime.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | β | βοΈ | β | β | βοΈ | β | βοΈ | β | β |
Databand is an observability platform that helps data engineers identify and troubleshoot pipeline issues and data quality problems.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | ? | ? | ? | β | ? | ? | ? | βοΈ | ? | ? |
Datafold is a data monitoring and observability platform that gives you confidence in your data quality through diffs, profiling, and anomaly detection.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | βοΈ | βοΈ | β | β | βοΈ | β | βοΈ | ? | ? |
Ataccama is an enterprise data catalog and observability tool featuring data profiling and data quality management, designed for data professionals.
Based on Open Standard | Search-based | Network-based | Lineage-based | Federation | ML 1st Citizen | Data Quality | End-to-end Lineage | Observability | Column-level lineage | Data collaboration |
---|---|---|---|---|---|---|---|---|---|---|
β | βοΈ | β | βοΈ | β | β | βοΈ | β | β | β | β |