This project is an internal initiative for Knowit Objectnet. The purpose of the project is manyfold. It should give hand-on experience on cloud solutions and serverless architecture and it aims to make data available for users of the data platform.
The main architecture consists of AWS Lambda-functions serving for data ingestion and processing and AWS S3 as datalake storage. API's for data exploration is under development and schemas are auto-generated using AWS Glue while querys can be done using AWS Athena.
Currently the platform gathers data from several different datasources, both as scheduled event polls and web hook events. Amongst the supported datasources are yr, github-repos and the internal CV database for Knowit, CV Partner.
npm install -g serverless@3.22.0
aws-cli
, please contact an existing member of the team for credentials.Members of the Dataplattform team can find useful information pertaining to the project on the wiki.
The individual services are deployed to AWS using
dataplattform deploy -s {PATHS TO SERVICES SEPARATED BY SPACE}
Contributing by adding a new datasource or other small improvements can be done by creating a pull request as local testing can be utilized and AWS connections mocked. For cloud testing one must be member of the Dataplattform team at Knowit Objectnet.
For more information about the CI/CD pipeline of this repo, see Dataplattform: CI CD Pipeline
Branches that are not named using the feature/
or fix/
prefixes are automatically protected, meaning that it is not possible to push directly to these branches. Instead, one must create a pull request into these branches, and these pull requests must be approved by another contributor in order to enable merge.
Lint checks and unit testing will be performed on every pull request.
Everything merged into the main
branch will be automatically deployed to the Dataplattform dev-environment.
To deploy the repository to the Dataplattform prod-environment, create a new release from the main
branch. Its tag
name should be one minor version up from the previous release.
After a deployment is made, either by merging into main
or by creatng a new release, you should always go to the
Actions tab to keep an eye on the deployment workflow, in case anything fails.
Every new feature or fix is created in it's own branch prefixed with feature/{BRANCH_NAME}
or fix/{BRANCH_NAME}
, accordingly, where the branch name is descriptive of it's intended purpose.
git checkout main
, and pull the most recent changes by running git pull origin main --rebase
.git checkout -b feature/{BRANCH_NAME}
or git checkout -b fix/{BRANCH_NAME}
.git commit -m "{COMMIT_DESCRIPTION}"
. git push --set-upstream origin feature/{BRANCH_NAME}
or git push --set-upstream origin fix/{BRANCH_NAME}
main
branch.main
by pressing squash and merge
.dev
environment. Keep an eye on the deployment workflow in case it fails.Once the dev
environment is functional and stable, you may want to push all of the previous changes to the production environment. This is done by creating a new release.
main
To create the CI/CD pipeline, a set of bash-scripts has been used. These scripts, along with their unit tests, are all
within the .workflow-scripts
directory and is only used within the Github workflows.
The BATS framework has been used for unit testing these scripts. If you have BATS installed on your local machine, the tests may be run using the following command:
bats -r .workflow-scripts
File: .github/workflows/deploy.yml
This workflow is ran every time code is pushed to the main
branch, or whenever a new release is published.
When ran, this workflow looks at every changed file, and looks for serverless.yml
files in any of the changed files'
parent directories. If a serverless.yml
file is found, that service is added to a list of services to be deployed.
Lastly, the command dataplattform deploy
is ran for every service that is to be deployed.
File: .github/workflows/lint-python.yml
If a Python file is changed in a pull-request, then this workflow runs Flake8 on every pyhton file in the project.
The Flake8 config in the file .config/.flake8
is used.
File: .github/workflows/lint-yaml.yml
If a YAML file is changed in a pull-request, then this workflow runs yamllint on every TAML file in the project.
The yamllint config in the file .config/.yamllint
is used.
File: .github/workflows/test-python.yml
This workflow is ran every time a pull-request is made to the main
branch.
When ran, this workflow looks at every changed file, and looks for tox.ini
files in any of the changed files'
parent directories. If a tox.ini
file is found, that service is added to a list of services to be tested.
Lastly, the command tox -r
is ran in every directory that is to be tested.
File: .github/workflows/test-workflow-scripts.yml
This workflow is ran every time a pull-request is made to the main
branch containing changes in the
.workflow-scripts
directory.
When ran, this workflow runs BATS unit tests on all the BATS-files within the .workflow-scripts
directory.