opensandiego / disclosure-backend-static

Campaign finance data powering Open Disclosure California
https://caciviclab.org/odca-jekyll/
0 stars 1 forks source link

Script to download candidate finance data information #2

Open tommy-stone opened 5 years ago

tommy-stone commented 5 years ago
  1. Go to https://public.netfile.com/pub2/?aid=CSD
  2. Select 2020 from the date
  3. Click export all
  4. Download file to downloads/raw/efile_SD_CSD_2020.zip
  5. Extract the file, rename it and move it to downloads/static/efile_SD_CSD_2020.xlsx
nancyheiss commented 4 years ago

I will take this one.

tommy-stone commented 4 years ago

@nancyheiss I updated the description for 2020. The 2019 information will be static so we don't need to download this file, only the 2020 file. Currently there is no information in the file but this should change at the end of the month. Netfile also updated it so that the file is downloaded as a zip file. I added some more steps in the comments to download the zip and expand etc. etc.

nancyheiss commented 4 years ago

@tommy-stone looks like I need permissions to the repo? I tried to push my branch (cloned with git@github.com:opensandiego/disclosure-backend-static.git) but got this error: ERROR: Permission to opensandiego/disclosure-backend-static.git denied to nancyheiss. fatal: Could not read from remote repository.

tommy-stone commented 4 years ago

@nancyheiss Should be good now. Give it another try

tommy-stone commented 4 years ago

Additional steps for the download portion

  1. After the download is complete, check that the file exists (efile_CSD_2020.zip) a. if the file doesn't exist, create an error in the error log with the timestamp ERROR
  2. Extract the file from the zip
  3. Perform verification steps that the new file (in ~/downloads/efile_CSD_2020.xlsx) is the same size or bigger than downloads/static/efile_SD_CSD_2020.xlsx
  4. If the verification passes, create a entry in the error log with timestamp SUCCESS
  5. Delete the efile_SD_CSD_2020.xlsx in /downloads/static/ and rename/move the efile_CSD_2020.xlsx to downloads/static/efile_SD_CSD_2020.xlsx
tommy-stone commented 4 years ago

Additional steps for the download portion

  1. After the download is complete, check that the file exists (efile_CSD_2020.zip) a. if the file doesn't exist, create an error in the error log with the timestamp ERROR
  2. Extract the file from the zip
  3. Perform verification steps that the new file (in ~/downloads/efile_CSD_2020.xlsx) is the same size or bigger than downloads/static/efile_SD_CSD_2020.xlsx
  4. If the verification passes, create a entry in the error log with timestamp SUCCESS
  5. Delete the efile_SD_CSD_2020.xlsx in /downloads/static/ and rename/move the efile_CSD_2020.xlsx to downloads/static/efile_SD_CSD_2020.xlsx
nancyheiss commented 4 years ago

@tommy-stone for some reason your name was not showing up in the list of potential reviewers for me to select from in this PR: https://github.com/caciviclab/disclosure-backend-static/pull/207 Can you please try running the shell script and let me know what issues you run into?

nancyheiss commented 4 years ago

@tommy-stone I didn't realize my PR was going to caciviclab instead of opensandiego. I made a new commit, but when I try to create a pull request it keeps going to caciviclab. I'm not sure how to fix that. Here is my branch: https://github.com/opensandiego/disclosure-backend-static/tree/downloadCandidateFinanceInfo