opendataam / opendatam-tasks

Public tasks for volunteers, hackathons and contests
Creative Commons Zero v1.0 Universal
8 stars 0 forks source link

[EN] Extract government budget data from Republic of Armenia interactive budget website #6

Closed ivbeg closed 1 year ago

ivbeg commented 1 year ago

Goal

The goal is to create a dataset with the Republic of Armenia's budget by extracting data from the government's interactive budget webpage. This tool is very simple, and this dataset could be effectively used in future data viz projects and contests,

Tasks

Armenian government provides budget data as an interactive tool at e-gov.am website https://www.e-gov.am/interactive-budget/ this tool provides visualization for budgets from 2016 to 2023 years. Data is loaded dynamically from the set of XML files. For example, the XML file of the 2022 budget https://www.e-gov.am/budget_archive/2022/data//GOV_BUDGET.XML and other supplemental XML also used, which could be found in web page sources,

  1. All XML files should be collected from the website.
  2. XML files should be converted to JSON and CSV file types.
  3. It would be great if data in main budget files would be enriched with grp, subgrp, program names and e.t.c.

Context

This source of the budget data is just one of the possible data sources. It's well prepared and we need to extract and enrich this data for easier further usage.

Requirements

Wishes

Please write your code as reusable code that could be launched by someone else later since we could need to update this dataset later.

Resources

Prepared by

The Open Data Armenia team prepared this task

arsen41531 commented 1 year ago

Link to the parser: arsen41531/opendatam-egov-am-budget-parser

Couple things worth noting:

ivbeg commented 1 year ago

@arsen41531 Fantastic! Thanks a lot!

arsen41531 commented 1 year ago

@arsen41531 Fantastic! Thanks a lot!

No problem! For posterity, including a code sample to enrich the data:

import pandas as pd

budget = pd.read_csv("./_data/2023-06-03/2016-gov_budget.csv")
contr_f = pd.read_csv("./_data/2023-06-03/2016-gov_contr_f.csv")
contr = pd.read_csv("./_data/2023-06-03/2016-gov_contr.csv")

# just to illustrate piping multiple functions together. 
# don't replicate this code style.
budget_enriched = (
   contr
     .merge(contr_f, on=['CONTID', 'CONTRACT'])
     .merge(budget, on='CONTID')
)

@ivbeg let me know if there's anything else for this task to be done. otherwise, i think we can close this one.

ivbeg commented 1 year ago

It's really helpful. Thanks!