openspending / cameroon.openspending.org

Website for "Cameroon Budget Inquirer"
http://cameroon.openspending.org/
7 stars 10 forks source link

Modify dataset structure for councils #36

Closed pzwsk closed 10 years ago

pzwsk commented 10 years ago

The idea is to have one dataset to rule all councils of the country so that new council data can be easily added to that one.

Here is a suggestion for the dataset structure : Id // how to generate with multiple contributors Head-account Head-Account Description Sub-account Sub-account Description Year Reporting Type {Actual, Budget} Amount Revenue/Expenditure {REVENUE, EXPENDITURE} Recurrent/Investment {RECURRENT, INVESTMENT} Region //name of the region Department //name of the dpt Communes //name of the communes

Notes :

pzwsk commented 10 years ago

Question regarding the publishing process : Is that possible to add one dataset of a new commune directly to the main communes dataset on OpenSpending or should the contributor add its data to the main dataset before updating it onto OpenSpending ?

Thanks

vitorbaptista commented 10 years ago

We need an extra column for cities with multiple communes like Yaoundé.

About updating the data, we'll have to define the columns that define a unique row (e.g. sub-account and year) so, when adding new data, only what's not in the dataset yet will be added.

Say that we upload the data:

... Sub-account Year ...
... 110.101 2009 ...
... 710.100 2009 ...

Then, a few months after, we upload:

... Sub-account Year ...
... 110.101 2009 ...
... 110.101 2010 ...
... 110.101 2011 ...
... 110.101 2012 ...
... 710.100 2009 ...
... 710.100 2010 ...
... 710.100 2011 ...
... 710.100 2012 ...

OpenSpending will add the new data for 2010, 2011, and 2012, and ignore the data for 2009, that it already has. So it doesn't matter if when uploading new data you add to the original .csv file, or upload a new .csv file with just the new stuff.

pzwsk commented 10 years ago

And if we have multiple communes/councils in one dataset, then we have to take year + sub-account + council as unique ID.

vitorbaptista commented 10 years ago

Yes, to be safe, it's better to take year + sub-account + every geographical division's column. They might have multiple councils from different regions with the same name.

vitorbaptista commented 10 years ago

I added a list of problems I've found with these datasets at #40. Looking back, I should've commented that here...