okfn-brasil / serenata-toolbox

πŸ“¦ pip module containing code shared across Serenata de Amor's projects | ** Este repositΓ³rio nΓ£o recebe atualizaçáes frequentes **
MIT License
154 stars 69 forks source link

Add electoral campaign donations datasets #169

Closed cuducos closed 6 years ago

cuducos commented 6 years ago

What is the purpose of this Pull Request? Add the electoral campaign donation datasets to the toolbox downloader.

What was done to achieve this purpose? Outside the repo I uploaded the .xz files to S3 and here I added the files to the LATEST constant.

How to test if it really works?

from serenata_toolbox.datasets import fetch, fetch_latest_backup
files = (
   '2017-11-30-donations-candidates.xz',
    '2017-11-30-donations-committees.xz',
   '2017-11-30-donations-parties.xz'
)
for filename in files:
    fetch(filename, 'data/')

And check if the filer were downloaded successfully ; )

Who can help reviewing it? @anaschwendler

cuducos commented 6 years ago

BTW:

FIx #165

And maybe it's useful to test the fetch_latest_backup function too:

from serenata_toolbox.datasets import fetch_latest_backup
fetch_latest_backup('data/')

That way we test if these new datasets are being downloaded by default in a default Serenata installation ; )

anaschwendler commented 6 years ago

:tada:

What I did to test this PR:

  1. Cloned the project:

    $ git clone git@github.com:datasciencebr/serenata-toolbox.git
  2. Change to its folder:

    $ cd serenata-toolbox
  3. Change to @cuducos’ branch:

    $ git fetch origin
    $ git checkout -b cuducos-donation-data origin/cuducos-donation-data
    $ git merge master
  4. Run the python fetch script:

    >>> from serenata_toolbox.datasets import fetch, fetch_latest_backup
    >>> files = (
    '2017-11-30-donations-candidates.xz',
    '2017-11-30-donations-committees.xz',
    '2017-11-30-donations-parties.xz'
    )
    >>> for filename in files:
    fetch(filename, 'data/')

The result:

Downloading 2017-11-30-donations-candidates.xz: 100%|β–ˆ| 239M/239M [02:34<00:00, 1.54Mb/s]
Downloading 2017-11-30-donations-committees.xz: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5.64M/5.64M [00:03<00:00, 1.69Mb/s]
Downloading 2017-11-30-donations-parties.xz: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6.47M/6.47M [00:03<00:00, 1.72Mb/s]

And for fetch_latest_backup script:

>>> from serenata_toolbox.datasets import fetch_latest_backup
>>> fetch_latest_backup('data/')

Good! πŸŽ‰