Open ltartari opened 11 years ago
Concern: How will, performance wise, Open Spending handle this?
The problem is: how to download all the data in a dataset in OpenSpending? As we don't save the CSV itself, OS will have to recreate it from the DB, which can be quite CPU intensive. Considering that we'll have data since 2006 (right, @ltartari ?), I'm afraid it'll timeout before finishing. Also, I don't want to hit OpenSpending too hard.
What I'm doing now is http://openspending.org/api/2/aggregate?dataset=orcamento_publico&format=csv&drilldown=funcao|subfuncao|orgao|uo|mod_aplic|elemento_despesa&measure=amount|pago|rppago&pagesize=99999999999999
I don't see an easy solution for this problem. Maybe we could hit that URL every time we update the data, just to warm the cache so the user won't have to wait too much. But if OpenSpending timeouts, we're out of luck.
@tryggvib Do you have any suggestions for this?
Data since 2006, check, @vitorbaptista . Do you know if there's a way to convert JSON to CSV on the client side? Wouldn't that help us?
Not really. JSON is bigger than CSV, so that would be slower...
OpenSpending only fetches CSV data from some external source. Couldn't we just fetch directly from there? It shouldn't be hard to add to dataset.json something like
{
"sources": [ "http://csv.files.com/fgv.csv", ... ]
}
And just add that link (or links). Why would we want to fetch it from the database if it already exists online?
@tryggvib The problem is that the uploaded csv isn't guaranteed to have all the dataset's data. Our plan is to, while we have data since 2006, only use 2013 while updating the dataset. So we won't have to upload a huge CSV and make OpenSpending read it all to discover that only the last hundred or so lines were added/modified.
There're other problems with sending just the original CSV URLs. For example, if that was the case, on the client's side I would have an ordered CSV list. I'll need the openspending's model, so I can check for and remove duplicates. Also, on OS side, you would need to remove (or tell me) what CSVs aren't used anymore (i.e. they were loaded, but then the user deleted the data).
Basically, I would have to recreate much of OpenSpending's data loading code. As it's a basic functionality, I think we should solve it on OS' side...
Thoughts?
As per @andressafioravanti suggestion, we could instead of having a button to download everything, simply use #17 to download all data for a specific year. This would limit the data amount and solve our problems.
@ltartari Just added the button to download the current year's data, as per @andressafioravanti suggestion. :beers:
Basically this will download a CSV with the whole dataset used in the application.