okfn-brasil / serenata-de-amor

🕵 Artificial Intelligence for social control of public administration | **This repository does not receive frequent updates. Check out the README**
https://serenata.ai/en
MIT License
4.52k stars 662 forks source link

Data visualization #95

Closed pmargreff closed 7 years ago

pmargreff commented 8 years ago

Hi guys, how about data visualization, do you already chat about? It could be a good way to show to people who haven't any technical skills or are from different field how the project would save a lot of money. I started develop some charts using D3. I'm curious to know about the kind of features you think is relevant to show (?). At this time I'm trying the money by person, by state and by subquota, what do u think about it?

cuducos commented 8 years ago

That's a great topic — thanks @pmargreff!

I'm aware @filipelinhares was working on something visual, and @vilapedro might be interested in this material for communication purposes.

Maybe this is a topic to be discussed in the serenata-website repo in the near future — when we consolidate a structure for a website focused on communication this repo here tends to narrow its focus to data science, while the other one will embrace many communication related topics I guess.

Filipe and Pedro — what do you think?

Pablo would you mind sharing something you already have? Maybe some screenshots would be nice to give more substance to what can be done from it.

filipelinhares commented 8 years ago

Hey @pmargreff!

I'm studying data vis and starting to dive in D3 and other tools. As @cuducos said, when we consolidate a structure for the new website we can use visualizations to improves the experience and interaction with the data.

Data visualization is an awesome field to explore in this project :heart:.

Pablo would you mind sharing something you already have? Maybe some screenshots would be nice to give more substance to what can be done from it.

pmargreff commented 8 years ago

We think on three basic kind of visualizations to startup.

First: Compare total values by states monthly, we do that dynamically using D3 Bar Charts.

bystate

Second: A heatmap, to show companies who have received most piece of money, the heatmap show the value, company name, cnpj and ranking position on mouse hover event. We use Hightcharts heatmap because we have found only D3 heatmap using days and months to index the values (calendar heatmaps), maybe in near future we can adapt this one to accept a simple matrix and use that.

bycompany

Third: A D3 Radar Chart to each congress person. The radar contain 18 axes with subquota categories and the money spent in the each category. Maybe in a second moment we can join 2 or more persons in the same radar to compare (example here). We have tried use this model, but the problem is, it was build in D3 V3 and it isn't compatible with D3 V4, would be awesome if someone help or refactor this model to version 4, because this Radar looks better in compare with all others.

After finish an reasonable model and documenting the basic (maybe on the next week) I can open the repo to anyone get it and add new visualizations.

pmargreff commented 8 years ago

Another thing, maybe show the values by person in this tree map, where each color is a sub-quota group and inside we can set the companies and correspondent value in the proportion. What do you think about that?

cuducos commented 8 years ago

@pmargreff Those pieces of dataviz are awesome! I think we should fit it somewhere, sure thing. @filipelinhares do you think the new website has an section for that? @vilapedro any thoughts on that?

pmargreff commented 8 years ago

Hey, here is a (temporary) first version, you can check the repo with a basic documentation to start up the project. Feel free to suggestions or ideas.

cuducos commented 8 years ago

Wow! That's awesome. Thanks for that, @pmargreff!

I definitively believe this could help people understand the importance of this quota. @filipelinhares can take it into account while planning our website and @vilapedro while planning our communication.

Just one minor detail (I'm not criticizing, just trying to make data more meaningful for people): I think the view by state has a lot of bias: the number of congress person is different from state to state, and the allowance also differs from state to state (some pages there are returning HTTP 503 — we can check that later). How difficult is it to ponder the total by state according to:

Once more, many thanks, mate!

pmargreff commented 8 years ago

@cuducos I understand, it could point to the wrong way if u don't have all details, but I don't think it's hard to fix.

For the first suggestion is possible divide the state value by the number of congress people from this state, it will show the mean, and show the number of congress person and total value in the tooltip.

I'm only a little confuse with the second one, you say the possible total (number of people by state * max possible value) or the sum(all values)/state value? And the suggestion is about generate a number equalizing this two metrics in the same one or divide in two different views/charts?

About the HTTP 503 - it's probably because it isn't a server, in the really is something like by backup computer and it isn't properly prepare to maintain a website stable. I have another problem with the size of json from the third view, I thinking how compact or make this file smaller (1.3 MB on this moment), but I haven't any good idea yet.

cuducos commented 8 years ago

About the second one, I suggest (it's merely a suggestion, I haven't put a lot of thought on it) dividing:

On a second stage dataviz could show who (within this given state) pushes the mean up or down…

About the 503 it was com camera.gov.br (not on your server). I was trying to link the max allowance by state for you ; )

pmargreff commented 8 years ago

Yes, I get it and make some sense, but I don't know if will have some impact if the people don't have an idea from the value itself. Maybe a line or something marking the max limit could representing almost the same.

I really like about the suggestion to see the outliers. I'll think in something and exec when I have some time.

I updated to mean value, and it really equalize much better, but some weird things happen' like in the lasts months of year (2014/2015) we can see the value bit the max. I'll try to find why and when it's happen to try explain this point.

Another observation, I was checking the last behavior and the net_value value isn't consistent for all occurrences, you can find the new value easily using Julia.

cuducos commented 8 years ago

Maybe a line or something marking the max limit could representing almost the same.

♥️

I'll try to find why and when it's happen to try explain this point.

Good. I couldn't spend some time on that today, I'm sorry about it.

Another observation, I was checking the last behavior and the net_value […]

We're debating this on #85, but we haven't reached a decision yet. Hold on a while longer ; )

ronybarbosa commented 8 years ago

Why don't you use tools like kibana for data visualization ?

pmargreff commented 8 years ago

@cuducos About the net_value bit the roof, I did found any reason, but I send a request to like you suggest on #85.

@ronydj Hello, I never use that, the only thing I know: the people use that with Elastic Search (and it doesn't mean nothing to me). If you know, you don't care about teach, you have free time, contact me on: pmargreff at gmail dot com.

UPDATE: I added two new charts on site, monthly value and average by party(really like this one).

cuducos commented 8 years ago

About the net_value bit the roof, I did found any reason, but I send a request to like you suggest on #85.

đź‘Ť Looking forward to check what they're gonna say ; )

UPDATE: I added two new charts on site, monthly value and average by party(really like this one).

There's a small typo (Montly instead of Monthly in one of the titles). But overall it's very good ; )

kassimorra commented 8 years ago

Hi Guys, Friend of mine told my about this great project. I got interested on this subject.

Is there anywhere that can I see what you want to show ?

I read this topic but didn't found the storytelling or the analysis that need to be done.

Kassim

pmargreff commented 8 years ago

I get a answer about the net_value bit the max value, the complete answer was:

Senhor Pablo,

A Câmara dos Deputados agradece seu contato.

Em atenção ao solicitado, o Departamento de Orçamento, Finanças e Contabilidade (Defin) da Câmara dos Deputados já se pronunciou no seguinte sentido: Com relação ao questionamento de que o gasto médio em um determinado mês apresenta valor superior ao da cota mensal disponibilizada ao parlamentar, o Ato da Mesa n° 43/2009, de 21/05/2009, que institui a Cota para o Exercício da Atividade Parlamentar estabelece:

“Art. 13. O saldo da cota não utilizado acumula-se ao longo do exercício financeiro, vedada a acumulação de saldo de um exercício para o seguinte.”

Portanto, caso o parlamentar não tenha utilizado todo o saldo acumulado durante os primeiros meses do ano, poderá utilizá-lo até o final do exercício financeiro.

Em caso de dúvidas, estamos à disposição.

Translate:

In meeting as follow Ato da Mesa n° 43/2009, de 21/05/2009 means if the total value isn't used in a month it will be cumulative to the next months. The value only expires when the finance year ends.

And the 13th article say that. I believe it could impact on some other metrics too.

cuducos commented 8 years ago

Great, @pmargreff — many thanks for sharing their response.

Welcome @kassimorra. Sorry about not getting back to you sooner.

Is there anywhere that can I see what you want to show?

We were discussing that these days and probably @vilapedro will be in touch — he's focused on communication and probably you both could better discuss what would be interesting in terms of dataviz.

cuducos commented 7 years ago

Closing this as dataviz is more relevant at serenata-website repo now.