openspending / cubepress

Flat-file generator for aggregated spending reports
MIT License
5 stars 2 forks source link

cubepress and OpenSpending Aggregations #4

Open rufuspollock opened 8 years ago

rufuspollock commented 8 years ago

Proposal is to make this the primary aggregator library for OpenSpending "next". Basic structure:

compute_aggregates(datapackage.json, aggregates.json)
    engine = sqlalchemy.engine(...)
    make_sql(engine, datapackage.json)
    load_data(engine, datapackage.json)
    for aggregates in aggregates.json:
        run_aggregate(...)
        save_results(...)  # output is aggregates/by-department.json or .csv

example data input layout on disk:

boost-armenia
    /aggregates/
    /aggregates.json -> specifies the aggregates to compute
    /viz/
        /treemap/index.html

 Code snippets

# Loading a Data Package and getting a stream
package = TabularDataPackage(‘datapackage.json”)
for row in package.resources[0].stream:
   # rows are pre-cast to their correct types etc
make_sql:
    jts-kit.make_database

load_data():
    get_stream(datapackage.resources[0])
    for row in stream:
        engine.insert_row(...)

Example command line:

cubepress datapackage.json aggregate.json dbname # default to sqlite as database type

=> 
csv file:
amount, year, department

datapackage.json
“name”: “mydata”
    [
        {
                    “name”: “amount”
                    “type”: “number”
                }
]   

aggregates.json:
sql: SELECT SUM(amount), department FROM “mydata” WHERE year = 2013
file: by-department
format: csv

=>

/aggregates/by-department.csv

Initial steps

https://github.com/okfn/dptools/blob/master/bin/load-postgresql.py

pudo commented 8 years ago

For reference (following discussion on IRC):

rufuspollock commented 8 years ago

One question: what is the benefit of having JSON structured queries as opposed to raw SQL. I guess my assumption is one could have both and that JSON structured queries become SQL and are useful because you can create them from a visualization/aggregation editor.

pudo commented 8 years ago

Generating a query is one (important) aspect of that JSON, but not the only one. You also need metadata for the visualisation to know which data to pick, and which fields to apply to which visual aspect in the visualisation.

Beyond that, the filter are also used to generate distinct value sets so that the resulting web site can have drop-downs.

You can hard-code all of that, but not having a clean model just ends up causing you more work in each specific instance.

Just for fun, check out the site-config generated from the view spec in OffenerHaushalt. It's a JSON blob in the frozen HTML.

rufuspollock commented 8 years ago

DataFreeze docs: https://dataset.readthedocs.org/en/latest/freezefile.html