okfn-brasil / serenata-toolbox

📦 pip module containing code shared across Serenata de Amor's projects | ** Este repositório não recebe atualizações frequentes **
MIT License
154 stars 69 forks source link

Cannot use ChamberDataset() to download data #206

Closed mnunes closed 5 years ago

mnunes commented 5 years ago

I was trying to download chamber of deputies reimbursements data using the following code:

import numpy
from serenata_toolbox.chamber_of_deputies.reimbursements import Reimbursements as ChamberDataset

years = numpy.arange(2009, 2019, 1)

for j in years:
    chamber = ChamberDataset(j, 'data_camara/')
    chamber()

However, this is the error I get when I run my code:

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/usr/local/lib/python3.7/site-packages/serenata_toolbox/chamber_of_deputies/reimbursements.py", line 28, in __call__
    self.fetch()
  File "/usr/local/lib/python3.7/site-packages/serenata_toolbox/chamber_of_deputies/reimbursements.py", line 35, in fetch
    urlretrieve(URL.format(self.year), file_path)
  File "/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

Since I am getting a 404 error, I opened reimbursements.py to see what is happening and it seems there is a problem on https://www.camara.leg.br/ website. Do you have any idea what is going on and how can it be fixed?

Senate data can be downloaded without any issue.

cuducos commented 5 years ago

it seems there is a problem on https://www.camara.leg.br/ website.

Yep, that's the case. Have they changed the URL or was the service temporarily down?

mnunes commented 5 years ago

Have they changed the URL or was the service temporarily down?

It seems some data is not available anymore. I did some research and if you try the following URLs, you get a 404 error:

https://www.camara.leg.br/cotas/Ano-2009.csv.zip https://www.camara.leg.br/cotas/Ano-2010.csv.zip https://www.camara.leg.br/cotas/Ano-2011.csv.zip https://www.camara.leg.br/cotas/Ano-2012.csv.zip

However, you can download data for the last 6 years:

https://www.camara.leg.br/cotas/Ano-2013.csv.zip https://www.camara.leg.br/cotas/Ano-2014.csv.zip https://www.camara.leg.br/cotas/Ano-2015.csv.zip https://www.camara.leg.br/cotas/Ano-2016.csv.zip https://www.camara.leg.br/cotas/Ano-2017.csv.zip https://www.camara.leg.br/cotas/Ano-2018.csv.zip

I think someone deleted reimbursement data from 2009 to 2012. It does not look like an URL change, as you can infer from the above links.

cuducos commented 5 years ago

That's interesting: I'm tweeting and tagging them, but feel free to send an official request and share what you find here ; )

jedibruno commented 5 years ago

I have submitted yesterday (27/12/2018) a formal request for information about this issue to the Chamber of Deputies. Legally, they have roughly until the end of January 2019 to give an official answer. As soon as I received something I'll update here.

Just for the record, the formal request text (in Portuguese) was this one below. In case anything else is missing, just give me the tip and I can make a new request.

--- FOIA REQUEST ---

Em 27/12/2018 tentou-se acessar os conjuntos de dados públicos da Cota de Exercício da Atividade Parlamentar (CEAP), referentes aos exercícios de 2009, 2010, 2011 e 2012, disponibilizados pela Câmara dos Deputados nas seguintes URLs (link):

https://www.camara.leg.br/cotas/Ano-2009.csv.zip https://www.camara.leg.br/cotas/Ano-2010.csv.zip https://www.camara.leg.br/cotas/Ano-2011.csv.zip https://www.camara.leg.br/cotas/Ano-2012.csv.zip

Entretanto, não foi possível acessar nenhum dos conjuntos de dados públicos em questão. Em virtude disso, requisitamos acesso às informações listadas abaixo. Para facilitar a compreensão das informações fornecidas, requisitamos que cada item seja respondido separadamente, indicando o item a que se refere: 1 – Por quais motivos, de fato e de direito, os conjuntos de dados dos exercícios referidos estão indisponíveis? 1.1 – Qual é o prazo, aproximado ou estimado, para o restabelecimento dos dados em questão? 1.2 – Caso os conjuntos de dados referido não sejam mais disponibilizados por motivo permanente: 1.2.1 – Por quais motivos, de fato e de direito, isso ocorre? 1.2.2 – Qual foi a autoridade pública que a supressão dos dados públicos em questão? Qual o seu nome e cargo? 1.2.3 – Requisitamos acesso ao inteiro teor digitalizado do ato administrativo e respectivo parecer que tenha autorizado a supressão dos dados públicos em questão.

Observação: a descrição técnica da ausência dos dados pode ser localizada nesta URL (em Inglês): https://github.com/okfn-brasil/serenata-toolbox/issues/206


mnunes commented 5 years ago

I just checked and the data are back. This issue can be closed now.

jedibruno commented 5 years ago

Yes, the Chamber of Deputies answered my request on the same day. Despite not informing why the data was missing, they replied that the error was corrected.

Em sex, 28 de dez de 2018 às 16:05, Marcus Nunes notifications@github.com escreveu:

I just checked and the data are back. This issue can be closed now.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/okfn-brasil/serenata-toolbox/issues/206#issuecomment-450402021, or mute the thread https://github.com/notifications/unsubscribe-auth/AolusQ6S2-_ZLMMtwLrBy9nEWW4jctNzks5u9l1xgaJpZM4ZhvNt .

-- Bruno Schimitt Morassutti - Advogado OAB/RS 93.297 55 51 99555-3910

cuducos commented 5 years ago

Closed as requested.