sayantikabanik / DataJourney

Open-source Data Management Framework
https://sayantikabanik.github.io/DataJourney/
Creative Commons Zero v1.0 Universal
8 stars 2 forks source link
allthingsopen dagster data-engineering flask gha holoviews intake mito open-source panel pytest
DJ rocks

CI code-complexity-check github-repo-stats Deploy DataJourney Stats

🚌 DataJourney

DataJourney demonstrates how organizations can effectively manage and utilize data by harnessing the power of open-source technologies. It's designed to help navigate the complex landscape of data tools, offering a structured approach to building scalable, and reproducible data workflows.

Built on open-source principles, the framework guides users through essential stepsβ€”from identifying goals and selecting tools to testing and customising workflows. With its flexible, modular design, DataJourney can be tailored to individual needs, making it an invaluable toolkit for data professionals.

πŸ›  Current workflows covered

{✨= Experimental, βœ… = Implemented}

βœ… Packaging framework added\ βœ… Conda environment added\ βœ… GitHub actions configured\ βœ… Pre-commit hooks configured for code linting/formatting\ βœ… Reading data from online sources using intake\ βœ… Sample pipeline built using Dagster\ βœ… Building Dashboard using holoviews + panel\ βœ… Exploratory data analysis (EDA) using mito\ βœ… Analysing source code complexity using Wily\ βœ… Web UI build on Flask \ βœ… Web UI re-done and expanded with FastHTML\ βœ… Leverage AI models to analyse data GitHub AI models Beta

πŸ“Š Repository stats

βš™οΈ Managed by GitHub Action: https://github.com/jgehrcke/github-repo-stats \ ⏳ Configured to run daily at 23:55:00 IST\ πŸ“¬ Checkout daily reports generated: DataJourney Stats on Web

Dataset metadata/citations

Codespaces configured

Currently new pre-build images are disabled due to limited storage

Screenshot 2022-08-29 at 3 41 12 PM (2)

Environment setup using conda:

Installing miniconda

Create a conda environment

conda env create -f environment.yml
conda activate journey

Install the package locally

pip install -e .

πŸ”Œ About pre-commit-hooks and activating

Just like the name suggests, pre-commit-hooks are designed to format the code based on PEP standards before committing. More details πŸ—’

pre-commit install

How to run the applications?

Dagster UI

cd analytics_framework/pipeline
dagit -f process.py

Dagit UI output

Panel app

cd analytics_framework/dashboard
python simple_app.py

NOTE: The dashboard generated is exported into HTML format and saved as stock_price_dashboard.html

Panel app output

Mito

Before running the jupyter notebook doc/mito_exp.ipynb, run the below command in your terminal to enable the installer. Might take some time to run.

To explore further visit trymito.io

python -m mitoinstaller install

mito output mito output operation

Display all data sources present via web UI

# Instructions specific to FastHTML app
cd intake/web_ui_fasthtml
python app.py
Link: http://localhost:5001
INFO:     Will watch for changes in these directories: ['../DataJourney/analytics_framework/intake/web_ui_fasthtml']
INFO:     Uvicorn running on http://0.0.0.0:5001 (Press CTRL+C to quit)
INFO:     Started reloader process [20071] using WatchFiles
INFO:     Started server process [20075]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

Screenshot 2024-07-31 at 4 42 44β€―PM