A collection of Jupyter notebooks for working with data from:
Note: If you encounter unfamiliar errors, try the Runtime > Disconnect and delete runtime menu item. If the error still occurs, please open an issue.
To use a notebook:
collection_ids
, schema_name
, etc.)If you make any improvements or fixes, please follow the Contributing guide below to merge your changes back into this repository.
You can also use a notebook without creating a copy. However, if you re-open the notebook, any changes and outputs will be lost.
Notebook | Open in Colab | Description |
---|---|---|
Publisher analysis template | Analyze data from a specific publisher. | |
Meta analysis template | Analyze data from multiple publishers, or to perform other types of analysis on the Kingfisher Process database. | |
Basic criteria feedback template | Provide feedback on the OCDS basic criteria. | |
Structure and format feedback template | Provide feedback on structure and format errors reported by lib-cove-ocds. | |
Data quality feedback template | Provide detailed feedback on structure, format, conformance and quality issues. | |
Usability checks template | Provide feedback on data usability for OCDS datasets. |
Notebook | Open in Colab | Description |
---|---|---|
Usability checks using a field list | Provide feedback on data usability for prospective OCDS publishers, using a field list, like from a field-level mapping. | |
Usability checks using the Data Registry | Provide feedback on data usability using data from the Data Registry. | |
Relevant checks using a field list | Provide feedback on data relevance for prospective publishers, using a field list, like from a field-level mapping. | |
Relevant checks using the Data Registry | Provide feedback on data relevance using data from the Data Registry. | |
Relevant checks for all the Data Registry publications | Provide feedback on data relevance downloading all the publications from the Data Registry. |
To ease maintenance, the notebooks are made up of reusable components. To see which components are used in each notebook, refer to the NOTEBOOKS
variable in manage.py
.
Reminder: If you edit the Check structure and format or Check quality components and change the headings or add new sections, check whether the related Document template in this process note needs an update.
Component name | Open in Colab | Tasks |
---|---|---|
Environment | Install requirements, import packages, load extensions and configure the notebook. | |
Cardinal setup | Install Cardinal requirements, define coverage functions and calculate the field list for a given file. | |
Charts setup | Install charts requirements, import charts packages and define plot functions. | |
Kingfisher Process setup | Connect to the database. Choose the collection(s) and schema to work with. | |
Field list setup | Load the field list. | |
Data Registry download data setup | Define the functions to list publications and download JSONL files from the registry. | |
Data Registry download data | Define the forms to select a publication and year and download the selected publication. | |
Kingfisher Process errors | Check for data collection and processing errors. | |
Structure scope | Check how many releases and records your data contains. Check the date range and stages of the contracting process covered by your data. | |
Usability setup | Define the usability functions. | |
Usability scope | Calculate general statistics. | |
Structure checks | Check for structure and format errors reported by lib-cove-ocds. | |
Conformance checks | Check against the OCDS conformance criteria. | |
Quality checks | Check for conformance and quality issues that require manual review. | |
Usability checks using Kingfisher with coverage | ||
Usability checks using a field list without coverage | ||
Relevant checks using a field list | Given a field list, check if the list pass the "relevant" criteria. | |
Relevant checks against all the publications from the Data Registry | Downloads all the publications from the registry and performs the "relevant" checks against the active ones. |
Use the buttons above to open the components from the main
branch for editing in Google Colaboratory (Colab).
To open a component from a different branch, use Colab's GitHub browser.
To encourage reuse, limit the scope of a component. The current scopes are:
In Colab:
NOTEBOOKS
variable in manage.py
.NOTEBOOKS
variable in manage.py
.README.md
.Once approved, you can merge your own changes.
For small changes, you can review the raw diff in the GitHub review interface.
For larger changes, you can review and comment on a visual diff by clicking the button. You need to authorize the app the first time you open it.
Install requirements:
pip install -r requirements.txt
Install the pre-commit script:
pre-commit install