okfn-brasil / mosaico

Visualization on the brazilian budget for FGV
GNU Affero General Public License v3.0
5 stars 0 forks source link

Data download button #15

Open ltartari opened 11 years ago

ltartari commented 11 years ago

Basically this will download a CSV with the whole dataset used in the application.

ltartari commented 11 years ago

Concern: How will, performance wise, Open Spending handle this?

vitorbaptista commented 11 years ago

The problem is: how to download all the data in a dataset in OpenSpending? As we don't save the CSV itself, OS will have to recreate it from the DB, which can be quite CPU intensive. Considering that we'll have data since 2006 (right, @ltartari ?), I'm afraid it'll timeout before finishing. Also, I don't want to hit OpenSpending too hard.

What I'm doing now is http://openspending.org/api/2/aggregate?dataset=orcamento_publico&format=csv&drilldown=funcao|subfuncao|orgao|uo|mod_aplic|elemento_despesa&measure=amount|pago|rppago&pagesize=99999999999999

I don't see an easy solution for this problem. Maybe we could hit that URL every time we update the data, just to warm the cache so the user won't have to wait too much. But if OpenSpending timeouts, we're out of luck.

@tryggvib Do you have any suggestions for this?

ltartari commented 11 years ago

Data since 2006, check, @vitorbaptista . Do you know if there's a way to convert JSON to CSV on the client side? Wouldn't that help us?

vitorbaptista commented 11 years ago

Not really. JSON is bigger than CSV, so that would be slower...

trickvi commented 11 years ago

OpenSpending only fetches CSV data from some external source. Couldn't we just fetch directly from there? It shouldn't be hard to add to dataset.json something like

{
    "sources": [ "http://csv.files.com/fgv.csv", ... ]
}

And just add that link (or links). Why would we want to fetch it from the database if it already exists online?

vitorbaptista commented 11 years ago

@tryggvib The problem is that the uploaded csv isn't guaranteed to have all the dataset's data. Our plan is to, while we have data since 2006, only use 2013 while updating the dataset. So we won't have to upload a huge CSV and make OpenSpending read it all to discover that only the last hundred or so lines were added/modified.

There're other problems with sending just the original CSV URLs. For example, if that was the case, on the client's side I would have an ordered CSV list. I'll need the openspending's model, so I can check for and remove duplicates. Also, on OS side, you would need to remove (or tell me) what CSVs aren't used anymore (i.e. they were loaded, but then the user deleted the data).

Basically, I would have to recreate much of OpenSpending's data loading code. As it's a basic functionality, I think we should solve it on OS' side...

Thoughts?

vitorbaptista commented 11 years ago

As per @andressafioravanti suggestion, we could instead of having a button to download everything, simply use #17 to download all data for a specific year. This would limit the data amount and solve our problems.

vitorbaptista commented 11 years ago

@ltartari Just added the button to download the current year's data, as per @andressafioravanti suggestion. :beers: