okfn-brasil / serenata-de-amor

🕵 Artificial Intelligence for social control of public administration | **This repository does not receive frequent updates. Check out the README**
https://serenata.ai/en
MIT License
4.52k stars 659 forks source link

Partner list of companies receiving money from politicians #16

Closed Irio closed 7 years ago

cuducos commented 7 years ago

I think this is mostly related to the partner list. I'm pondering on two issues about this dataset before bringin it to S3:

So before making it available I would like to know about best practices in versioning (arguably similar) datasets:

  1. Should we rename it to companies-no-geolocation?
  2. Should we add geolocation to it?
  3. Should we strip off everything but CNPJ and partner list (making it an complimentary dataset to the former companies dataset)?

What do you think @Irio?

Irio commented 7 years ago

When we first generated it, the companies.xz file already had geolocation (using src/geocode_addresses.py). I'm good with option number 1 if we work on number 2 later. @cuducos

jtemporal commented 7 years ago

I'm good with option number 1 if we work on number 2 later

I agree with this approach ;)

marcusrehm commented 7 years ago

For some reason it has less 7% companies than the last one (no idea why)

I think I ran it using only the reimbursements dataset. Another reason could be that the last script were filling lines with blank info only with the message error for "CNPJ inválido".

cuducos commented 7 years ago

Renaming it, opening an issue to add geolocation… and closing this issue! Hell yeah ; ) Thank you so much @marcusrehm 🎉

Closed by #218