okfn-brasil / rosie

🤖 Python application responsible for Serenata de Amor's intelligence
409 stars 60 forks source link

Simplify classifiers code #50

Open lipemorais opened 7 years ago

lipemorais commented 7 years ago

I think that the classifiers code are very complex, and it's not because of data science context knowledge necessary to understand it. I'm afraid that few people will be able to maintain it in a near future.

There is some magic numbers, a considerably amount of method chains and big methods in some classifiers.

https://github.com/datasciencebr/rosie/tree/master/rosie/chamber_of_deputies/classifiers

lipemorais commented 7 years ago

I would like to address this issue but I need help with what that code are trying to say.

I believe that solve this issue will help with transparency making easier for people understand each classifier and will help people contribute with classifier as well.

My plan is address one classifier at time:

Who I believe that can help with this? @cuducos @anaschwendler @jtemporal @Irio

I will keep this TODO list update as long as I open the PRs. :)

anaschwendler commented 7 years ago

Hi @lipemorais thanks for that. I really believe that this might help a lot!

I'll help you refactoring the classifiers, and my suggestions is one PR for each one, and that we keep testing in small bytes, always checking if the number os suspicions for each classifier remain the same in the way to that refactor.

Please any doubt, say here and we keep a conversation that everyone can participate 🎉

lipemorais commented 7 years ago

I will start with election_expenses_classifier.py.

As soon as I have some doubt I will put it here. :)

cuducos commented 6 years ago

Closed automatically, reopened because we still have refactors to do ; )