Use classified datasets to generate graphs (Grafana?)

okfn-brasil / serenata-de-amor

🕵 Artificial Intelligence for social control of public administration | **This repository does not receive frequent updates. Check out the README**

https://serenata.ai/en

MIT License

4.51k stars 661 forks source link

Use classified datasets to generate graphs (Grafana?) #302

Open waldofe opened 6 years ago

waldofe commented 6 years ago

Hi!

I see we already can fetch and classify reimbursements using https://github.com/datasciencebr/rosie. Would that be interesting (in terms of project goals) to have such data being showed by graphs in a open, on-premise graph solution such as Grafana?

Personally I see great advantages of showing the world the following fact:

Here is the corruption level of congress people 2 years ago, and here's what it looks like now.

I'm very happy to see the project active and would love to contribute.

cc @Irio

cuducos commented 6 years ago

Hi, @oswaldoferreira — as far as I could understand your suggestion I can say these goals are currently discussed as the future of Jarbas. What do you think?

waldofe commented 6 years ago

Hey @cuducos, thanks for the response!

I see you guys have a way ahead discussion there, that's great!

My proposal was generating those graphs with an existing (open-source) platform without having to actually code the Frontend and sketch the whole UX for it. Basically we'd use a Python library (on Jarbas or Rosie) to speak to our Grafana server and have a monitor-like view like that:

Grafana graphs

This might be more interesting to buildup all kinds of metrics (for people that already know what Serenata is all about) than trying to please the uninformed user.

willianpaixao commented 6 years ago

@cuducos have you thought about that? Would be nice to have some more stats, more real-timeish numbers about the expenses. I would love to help with that.

cuducos commented 6 years ago

I do believe it's a good idea.

willianpaixao commented 6 years ago

Does the project has a central place to generate the stats? can you please point where and how are the numbers stored? For example, can you please show me where the reimbursements are stored?

cuducos commented 6 years ago

Datasets with official data are generates with the serenata-toolbox (linked in the README.md) and stored locally in CSV format
Suspicions data is generated by Rosie (in this repo) and stored locally in CSV format
All this data is exported for search and visualization over the web in Jarbas (in this repo) and stored in PostgreSQL

willianpaixao commented 6 years ago

@cuducos I successfully downloaded the toolbox, the datasets and took a look at some of the CSV files. I could start myself generating some graphs, but would be nice to clarify some doubts before.

Using a tool like Grafana (or similar) only makes sense if Rosie would be online and making calculations throughout the year. But I already understood that it is run in batches. Therefore the generated data also comes in batches. So if I set up some metric tool, it would be "frozen" for a few months. Am I correct?
But still, even tho a real-time tool wouldn't be needed, some semi-automated metrics could be useful, right? Automated reports with graphs about the number acquired like expenses, reimbursements, numbers generated by Rosie about the suspicious data. Those numbers and graphs could be exported to some static website with a semi-automated pipeline of deployment. What are your thoughts on that?

That's kinda easy to do, as long as I know exactly what to do and where to look and often that's the tricky part.

cuducos commented 6 years ago

Using a tool like Grafana (or similar) only makes sense if Rosie would be online and making calculations throughout the year. But I already understood that it is run in batches. Therefore the generated data also comes in batches. So if I set up some metric tool, it would be "frozen" for a few months. Am I correct?

Yes. We don't have money to keep a server up do run Rosie periodically.

But still, even tho a real-time tool wouldn't be needed, some semi-automated metrics could be useful, right? Automated reports with graphs about the number acquired like expenses, reimbursements, numbers generated by Rosie about the suspicious data. Those numbers and graphs could be exported to some static website with a semi-automated pipeline of deployment. What are your thoughts on that?

I like that : ) In addition there's some points on dataviz on #282 that might be interesting ; )

willianpaixao commented 6 years ago

Hum, I read the related issue but apparently, his work was in the wrong direction and the failure to keep the development, having the ticket closed be inactivity. Which kinda brings me to square zero.

But I have an idea, the numbers displayed on the project's homepage. Is there any document or procedure to re-generate it?

cuducos commented 6 years ago

I read the related issue but apparently, his work was in the wrong direction and the failure to keep the development, having the ticket closed be inactivity.

I do think this is not the really important thing there. But from our experience, some comments on how to ponder data is what I meant ; )

the numbers displayed on the project's homepage. Is there any document or procedure to re-generate it?

They are the result of several explorations registered as Jupyter notebooks.

willianpaixao commented 6 years ago

They are the result of several explorations registered as Jupyter notebooks.

WOW! I didn't know about this folder. It has amazing stats. I'll start looking at the most recent ones.

willianpaixao commented 6 years ago

Adding @thiagoalmeidasa to the thread.

We were discussing on how to add stats and graphs with the least impact or adding too many new components to the current stack. Our suggestion is to use Bokeh in some stats/ in Jarbas for example.

cuducos commented 6 years ago

Our suggestion is to use Bokeh in some stats/ in Jarbas for example.

Will you use Bokeh to generate Django views or to generate static files? Maybe it's a good idea to create some API endpoints for the stats, and have the dataviz as a static site served directly through nginx (or even hosted somewhere else such as GitHub Pages, for example).

willianpaixao commented 6 years ago

Yes, API endpoints are included! It's one way to pass the data to Bokeh.