okfn-brasil / serenata-toolbox

📦 pip module containing code shared across Serenata de Amor's projects | ** Este repositório não recebe atualizações frequentes **
MIT License
154 stars 69 forks source link

Where do the datasets download? #216

Closed michaelyan-coupa closed 4 years ago

michaelyan-coupa commented 5 years ago

Where do the datasets download? I followed the README and wrote up a python script to perform the downloads, however I cannot find them within the folder. Thanks!

cuducos commented 5 years ago

From the README.md:

# will download these specific datasets and store into /tmp/serenata-data folder
$ serenata-toolbox /tmp/serenata-data --module federal_senate chamber_of_deputies

That is to say, the first argument is where data is stored. Have you used the first argument to direct the downloads to a specific folder? If you haven't, the default is data/.

michaelyan-coupa commented 5 years ago

Where can I access photos of the receipts? And where are the corresponding JSON files for the OCR extraction? I am referring to this post https://github.com/okfn-brasil/serenata-de-amor/issues/188

cuducos commented 5 years ago

Where can I access photos of the receipts?

As I explained elsewhere:

you can download [them] from the source concatenating the URL as we do in Jarbas.

The code linked is as follows:

        args = (self.applicant_id, self.year, self.document_id)
        return (
            'http://www.camara.gov.br/'
            'cota-parlamentar/documentos/publ/{}/{}/{}.pdf'
        ).format(*args)

Does that make sense?

cuducos commented 5 years ago

I see you've asked (but maybe deleted) about the .xz files, @michaelyan-coupa.