okfn-brasil / serenata-toolbox

📦 pip module containing code shared across Serenata de Amor's projects | ** Este repositório não recebe atualizações frequentes **
MIT License
154 stars 69 forks source link

Big refactor of the public API to generate the datasets #87

Open cuducos opened 7 years ago

cuducos commented 7 years ago

This issue is proposed as a roadmap to a big refactor in the public API. This issue might also work as a whishlist for you who uses this toolbox and believe its API for generating the datasets could be improved. I'll suggest a to-do list in this opening post and try to keep it updated as the following discussion goes by.

The main problems with the current one has been discussed by @lipemorais and myself in several other issues and PRs. For example:

Therefore what I propose here is to:

I think that this refactor will enhance our code quality and architecture and can pave the way to more overarching changes such as:

JoaoCarabetta commented 7 years ago

Hey,

is someone working on it? If positive, let me know!

I think that this toolbox should aim to cover all api endpoints from the Senate and Chamber of Deputies and other congress related data. I know that it seems a quite big dream. But, with that in mind, we should think on the structure of the project to easily accept new data entries in an organized way.

With a more organized way to insert new datasets, other projects could build upon this parsing structure. With more data available, political scientists and journalists can perform better and quicker analysis. Also, more correlation ideas could flourish and turn into apps from this easily available structured data.

Following this big dream, I propose one more step to this enhancement proposal:

:)

lipemorais commented 7 years ago

Hey, @JoaoCarabetta ! I'm working on it. I believe that to be able to make this we need a big unit test coverage to understand faster the impacts of each change. So to address it I'm working to have more unit tests coverage because the journey tests takes too long to give us fast feedback.

trmendes commented 7 years ago

@lipemorais @JoaoCarabetta are you guys still working on it?!

lipemorais commented 7 years ago

@trmendes Hell yeah! This week I open a PR to cover Chambers Deputies module in #124 and some other improvements around tests like #134.

Would like to help us on this?

trmendes commented 7 years ago

@lipemorais I would like to help! Learning Python here and it is nice to have a project like this one to help.

willianpaixao commented 6 years ago

While working on #199 I stumbled upon this ticket and here I want to update and make some considerations.

Regarding "rewrite fetch, translate, clean into more atomic methods, with really simple logic and adding more methods if needed" I am facing the same problem. Those methods are a bit confusing and each dataset (chamber and senate) has a different implementation. My intention is to make little more centralized (maybe with some common classes) to download and process the datasets in a more unified way. In my implementation, I'm using asyncio and it's sister libraries to make parallel processing. See my branch for more information.

Any feedback is very welcome.

cuducos commented 6 years ago

Great start — many thanks, @willianpaixao : ) I added minor comments to your WIP commit, hope they are helpful!