nucleuscloud / neosync

Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
https://www.neosync.dev
MIT License
2.93k stars 102 forks source link
benthos docker etl faker fine-tuning golang kubernetes nextjs open-source orchestration protobuf react reactjs self-hosted synthetic-data synthetic-data-generation test-data-generator testing typescript

Open Source Data Anonymization and Synthetic Data Orchestration

| Website | Docs | Discord | Blog | Changelog | Roadmap

PRs Welcome Go Tests Follow X ArtifactHub Neosync
## Introduction [Neosync](https://www.neosync.dev) is an open-source, developer-first way to anonymize PII, generate synthetic data and sync environments for better testing, debugging and developer experience. Companies use Neosync to: 1. **Safely test code against production data** - Anonymize sensitive production data in order to safely use it locally for a better testing and developer experience 2. **Easily reproduce production bugs locally** - Anonymize and subset production data to get a safe, representative data set that you can use to locally reproduce production bugs quickly and efficiently 3. **High quality data for lower-level environments** - Catch bugs before they hit production when you hydrate your staging and QA environments with production-like data 4. **Solve GDPR, DPDP, FERPA, HIPAA and more** - Use anonymized and synthetic data to reduce your compliance scope and easily comply with laws like HIPAA, GDPR, and DPDP 5. **Seed development databases** - Easily seed development databases with synthetic data for unit testing, demos and more ## Features - **Generate synthetic data** based on your schema - **Anonymize existing production-data** for a better developer experience - **Subset your production database** for local and CI testing using any SQL query - **Complete async pipeline** that automatically handles job retries, failures and playback using an event-sourcing model - **Referential integrity** for your data automatically - **Declarative, GitOps based configs** as a step in your CI pipeline to hydrate your CI DB - **Pre-built data transformers** for all major data types - **Custom data transformers** using javascript or LLMs - **Pre-built integrations** with Postgres, Mysql, S3 ## Getting started Neosync is a fully dockerized setup which makes it easy to get up and running. A [compose.yml](./compose.yml) file at the root contains production image refs that allow you to get up and running with just a few commands without having to build anything on your system. Neosync uses the newer `docker compose` command, so be sure to have that installed on your machine. To start Neosync, clone the repo into a local directory, be sure to have docker installed and running, and then run: ```sh make compose/up ``` To stop, run: ```sh make compose/down ``` Neosync will now be available on [http://localhost:3000](http://localhost:3000). The production compose pre-seeds with connections and jobs to get you started! Simply run the generate and sync job to watch Neosync in action! ## Kubernetes, Auth Mode and more For more in-depth details on environment variables, Kubernetes deployments, and a production-ready guide, check out the [Deploy Neosync](https://docs.neosync.dev/deploy/introduction) section of our Docs. ## Resources Some resources to help you along the way: - [Docs](https://docs.neosync.dev) for comprehensive documentation and guides - [Discord](https://discord.com/invite/MFAMgnp4HF) for discussion with the community and Neosync team - [X](https://x.com/neosynccloud) for the latest updates ## Contributing We love contributions big and small. Here are just a few ways that you can contribute to Neosync. - Join our [Discord](https://discord.com/invite/MFAMgnp4HF) channel and ask us any questions there - Open a PR (see our instructions on [developing with Neosync locally](https://docs.neosync.dev/guides/neosync-local-dev)) - Submit a [feature request](https://github.com/nucleuscloud/neosync/issues/new?assignees=&labels=enhancement%2C+feature&template=feature_request.md) or [bug report](https://github.com/nucleuscloud/neosync/issues/new?assignees=&labels=bug&template=bug_report.md) ## Licensing We strongly believe in free and open source software and make this repo is available under the [MIT expat license](./LICENSE.md).