Closed cabral closed 7 years ago
I successfully filtered records from congresspeople using a spreadsheet available here (source: http://www2.camara.leg.br/deputados/pesquisa). Unfortunately, this list includes just those in the current term, elected or just active as substitute.
import numpy as pd
import pandas as pd
data = pd.read_csv('../data/2016-08-08-current-year.xz',
parse_dates=[16],
dtype={'document_id': np.str,
'congressperson_id': np.str,
'congressperson_document': np.str,
'term_id': np.str,
'cnpj_cpf': np.str,
'reimbursement_number': np.str})
congresspeople = pd.read_excel('../data/deputado.xls')
is_individual_document = \
data['congressperson_name'].isin(congresspeople['Nome Parlamentar'])
data = data[is_individual_document]
I guess this issue is soved: we can filter (as exemplified above) and we found out that CEAP (check CEAP.md) has extra allowances for party leadership, government leadership etc. If I'm wrong feel free to re-open this topic.
reimbursements[reimbursements['congressperson_id'].notnull()]
will return records just in the name of a congressperson, removing leaderships.
Problem: If you divide 213 million(the amount of the quota used per year) by 266 thousand (the average amount used per congressperson) the result is 800 people. The number is 287 higher than the total of 513 congressperson in activity (list of congressperson ) The number of 800 have alternates of congressperson, leaderships of the PSDB and PT parties and also have some questions like: who is SDD ?