okfn-brasil / serenata-toolbox

📦 pip module containing code shared across Serenata de Amor's projects | ** Este repositório não recebe atualizações frequentes **
MIT License
154 stars 69 forks source link

[WIP] Acquire presence in sessions #17

Closed fgrehm closed 7 years ago

fgrehm commented 7 years ago

This is currently a big WIP for scraping data from presence in sessions for the current mandate. Not sure how to best structure so that it better fits the purpose of this package and because of that I'd love some feedback about it before I go on a tangent on a refactoring session 😄

Related to https://github.com/datasciencebr/serenata-de-amor/issues/137 as well

fgrehm commented 7 years ago

@Irio @cuducos what do you guys think about having those datasets uploaded to S3 while we discuss how to refactor this hacky script into something that can be reused? This has a lot more data about presence of deputies in Brasilia than the speeches dataset 💭

cuducos commented 7 years ago

if the script works, I vote for let's merge it and for let's upload the datasets to S3. Can you check why GitHub says there's conflict? It's just one file, I don't get it…

And, finally: wanna pair to refactor it?

fgrehm commented 7 years ago

if the script works, I vote for let's merge it and for let's upload the datasets to S3

Up to you guys 😄

Can you check why GitHub says there's conflict? It's just one file, I don't get it…

I don't see any conflict on my end 😕

And, finally: wanna pair to refactor it?

Sure, how about a screenhero session next week? Lets chat over telegram to schedule a good time for both of us

fgrehm commented 7 years ago

if the script works

I haven't run it in isolation but it is pretty much a copy & paste from the notebook, I can give it a try next week to double check before we pair on it

cuducos commented 7 years ago

@Irio: can you upload the datasets to S3 and drop a line so I know I can merge?

@fgrehm: don't worry about conflicts then, I'll figure it out. Drop a line on Telegram and we pair next week then! Many thanks ; )

Irio commented 7 years ago

Just uploaded them to Amazon.

data/2016-12-21-sessions.xz
data/2016-12-21-deputies.xz
data/2016-12-21-presences.xz
thiagofelix commented 7 years ago

Hi Folks

I recently found out about the serenata project, well done project, congratulation to all members ;)

I've been working with the same dataset by myself in the last couple days, my goal was to put everything behind REST API ( biography, speeches, voting, presences on the house, presences on commissions )

It is still work in progress but usable at some extent, following few examples:

Get biography by id https://api-camara-deputados.herokuapp.com/deputados?id=eq.161907

Get projection of fields from presence in the house https://api-camara-deputados.herokuapp.com/presenca?limit=10&select=nomeParlamentar,frequencia,data,sessao:descricao,legislatura&limit=10

The dataset is exposed by postgrest.

fgrehm commented 7 years ago

@thiagofelix that's nice! mind sharing your thoughts on how we can use it as a separate issue for discussion? this is a closed PR and not many people will find it 😄 your API should make scraping a lot easier and more reliable (I've seen some of those endpoints return a 50x quite frequently on business days)

Also, is the code for scraping that data up somewhere?

thiagofelix commented 7 years ago

@fgrehm Thanks for the reply. I am finishing few bits before publish the source code. But should be on my GitHub by the end of week.

I will open another issue once I get the API and source code ready to use so we can talk more =)