waidyanatha / rezaware

rezaware utils ml, lib, and etl workload packages
0 stars 0 forks source link

rezAWARE

The README file is useful for projects that are using rezaware platform for AI/ML and augmented BI pipelines. It is designed to integrate data wrangling, mining, and visualization in to a single coherent project. Here we introduce ways for getting started with the platform framework. The WIKI for comprehensive documentation on the rezaware methodology, functional components, and behaviour.

NOTE: instructions and content is specific to Debian distros and was tested on Ubuntu 20.04.

Table of Content

Starting a New Project

  1. Create an empty git repository with the a desired project name; e.g., MyNewProj .
  2. Clone your MyNewProj into a desired director location; for example
    • cd ~/all_rez_projects/
    • git clone https://github.com/<my_git_user_name>/MyNewProj.git
  3. Move into the newly created project folder
    • cd ~/all_rez_projects/MyNewProj
  4. Now clone and initialize rezaware platform as a submodule
    • git submodule add -b main https://github.com/waidyanatha/rezaware.git rezaware
    • git submodule init; will copy the mapping from the .gitmodules file into the local ./.git/config file
  5. Navigate into the rezaware folder and run setup to initialize the project with AI/ML functional app classes
    • cd rezaware
    • In the next command run the setup for rezaware separately and the apps separately
      • python3 -m 000_setup --app=rezaware --with_ini_files; it is important to use the _--with_inifiles directive_ flag because it instructs _000setup.py to build the rezaware app and python init.py and app.ini files necessary for the seamless package integration
      • python3 -m 000_setup; at the onset you would not have any wrangler, mining, and visuals code in the respective modules folders; hence, you cannot build the python init.py and app.ini files. Without the _--with_inifiles directive the process will simply generate the app folder structure and default app.cfg file.
    • You have now created your MyNewProj with the rezaware platform framework and can begin to start coding.
    • Note you need to configure the app.cfg in the mining,wrangler,and visuals apps
      • each time you add new module packages; it needs to be added or removed from app.cfg
      • any other parameters, specific to the project must be changed.
  6. Change back to the project director
    • cd .. or cd ~/all_rez_projects/MyNewProj
  7. Add the submodule and initialize
    • git add .gitmodules rezaware/
    • git init
  8. Install dependencies with python poetry.
    • The pyproject.tom file would be created from the previous 000_setup.py step
    • poetry --version will confirm if poetry dependency manager is installed
    • If required, follow the poetry installation docs
    • Activate the lock file with poetry lock
    • Install dependencies with poetry install
    • confirm installation and environment with poetry shell; create a default shell with (rezaware-py3.10)
  9. (Optional) Include a README.md file, if not already
    • echo "# Welcome to MyNewProj" >> README.md
  10. Add and commit all newly created files and folders in MyNewProj
    • git add .
    • git commit -m "added rezaware submudle and setup project"
  11. Push the submodule and new commits to the repo
    • git push origin main
    • Check your github project in the browser; you will see a folder rezaware @ xxxxxxx; where xxxxxxx is the last 7 digits from the rezaware.git repo commit code

Test the new Project

Run pytest by executing the command in your terminal prompt

Update rezaware from remote repo

From time to time you will need to update the rezaware submodule, in your project.

  1. change your directory to MyNewProj folder
    • cd ~/all_rez_projects/MyNewProj
  2. fetch latest changes from rezaware.git repository, and merge them into current MyNewProj branch.
    • git submodule update --remote --merge
  3. update the repo in github:
    • git commit -s -am "updating rezaware submodule with latest"
    • git push origin main

Reconfiguring existing project

When you add a new module package into the mining, wrangler, and visuals app folders; as well as defining them in the app.cfg file, the init and app.ini framework files need to be updated. For such simply run the _000setup.py

About the Post Setup Artifacts

  1. Mining - Arificial Intelligence (AI) and Machine Learning (ML) analytical methods
  2. Wrangler- for processing data extract, transform, and load automated pipelines
  3. Visuals - interactive dashboards with visual analytics for Business Intelligence (BI)
  4. utils.py- contains a set of framework functions useful for all apps
  5. app.cfg - defines the app specific config section-wise key/value pairs
  6. Folders - each of the mining, wrangler, and visuals folders will contain a set of subfolders
    • dags - organizing airflow or other scheduler pipelines scripts
    • data - specific parametric data and tmp files
    • db - database scripts for creating the schema, tables, and initial data
    • logs - log files created by each module package
    • modules - managing the package functional class libraries
    • notebooks - jupyter notebooks for developing and testing pipeline scripts
    • tests - pytest scripts for applying unit & functional tests for any of the packages

Deprecated

  1. (Recommended) you may also consider setting up an Anaconda environment with python-3.8.10 to avoid any distutils issues.
    • create a new environment using the requirements.txt file that is in the rezaware folder:
      • conda create --name rezenv python=3.8.10 --file requirements.txt
    • Thereafter, check if all packages, listed in requirements.txt was installed
      • conda list will print a list to stdout
    • Activate your conda environment;
      • e.g. conda activate rezenv