opendataam / opendatam-tasks

Public tasks for volunteers, hackathons and contests
Creative Commons Zero v1.0 Universal
8 stars 0 forks source link

[EN] Collect metadata on the reports of the Armenian NGOs #8

Open ansakoy opened 1 year ago

ansakoy commented 1 year ago

Goal

The goal is to create a dataset containing the metadata on the reports of the Armenian NGOs.

Tasks

The metadata are published at an ASPX website (https://www.petekamutner.am/Reports_vh.aspx?rptid=1), which may require some browser emulating libraries. Otherwise, the task is to collect all the data from the given paginated table and store them in a machine readable format, such as JSON, or XML, or CSV in a flat structure. For example:

{
    "reg_num": STRING,
    "year": NUMBER,
    "taxpayer_id": STRING,
    "org_name": STRING,
    "org_type": STRING,
    "report_date": STRING,
    "report_url": STRING,  // the url to download the given report
    "date_update": STRING,
}

Please, bear in mind that all the digital IDs such as the taxpayer ID (ՀՎՀՀ) should always be stored as strings (characters) to preserve their precise value, including leading zeros, if there are any.

Context

The State Revenue Committee of the Republic of Armenia publishes the reports of NGOs, which may contain invaluable information on how the non-government and non-profit sector operates. Problem is, the reports themselves are published as PDF files that are hard to automatically process. However, these files are accompanied by rather helpful metadata, including organizations' names and IDs, and reports' years that allow to quickly search for specific reports, as well as to check out whether an NGO has such reports at all.

Requirements

A public GitHub repository should be created to store and publish the code and the data under one of the free and open licenses, such as Creative Commons or MIT.

Wishes

It would be best if your code is reusable, that is can be launch again by anyone who might want to update the dataset at a later point. For the same reason, we encourage you to comment your code, supplement it with at least a very brief README description, and specify the requirements and dependencies necessary to use the code.

Resources

https://www.petekamutner.am/Reports_vh.aspx?rptid=1

Prepared by

The Open Data Armenia team prepared this task