pjrule / federal-election-data

Federal election data released by the Federal Election Commission from 1982-present in an open format
7 stars 1 forks source link

Election finance data #2

Open InnovativeInventor opened 6 years ago

InnovativeInventor commented 6 years ago

Apparently, all of the FEC's data on campaign finances and related stuff is here in a very easy and consistent text format. I think some simple parsing of the data will turn them into a nice csv file that will be consistent with the format in which the voting data is in.

InnovativeInventor commented 6 years ago

I had trouble committing all the raw data because the files are so large. Maybe we could use git LFS? In the meantime, here is the script I made to get all the raw finance data.

https://github.com/InnovativeInventor/federal-election-data/blob/master/scripts/fetch_finances.sh

How do you want to work together with me on this? Do you want me to submit pull requests or do you want to add me as a collaborator?

pjrule commented 6 years ago

Yes, I would be absolutely thrilled to have you working on this with me! I'll add you as a collaborator. :)

pjrule commented 6 years ago

We could potentially set up a requester-pays Amazon S3 bucket to aggregate the data, a la arXiv. Probably overkill for now. I should mention that the FEC already has an API and rudimentary search engine for some of their data— including finance data—but they don't have election results available in such a format, which is where this project comes in. Nonetheless, there is definitely value in taking those publicly available CSV-formatted data and aggregating it, writing scripts to process and analyze it, searching for trends, and writing guides on how to approach it. I don't want to creep the scope, but it would be great to have an interactive infographic that generates maps/graphs of campaign contributions superimposed on a geographical map. There are many JS frameworks out there that could make this relatively easy. That script is great, btw. 👍

InnovativeInventor commented 6 years ago

Glad we are on the same page! Collecting all the info into one spot should make analysis easier for others. We could definitely try to keep the csv formatted data here, but the raw files aren't too necessary to keep here.

I did some testing and adding the raw financial data added 17.8 GB of data. If people want to verify the documents, they are probably not going to be going through a github repository manually comparing files. They'll probably clone it and run our scripts themselves. But, I do like your idea of having a requester-pays S3 bucket. If this does gain traction, that might be something to investigate. I was thinking of Github's LFS service with some sort of crowdfunding to cover the 5$ a month fee.

pjrule commented 6 years ago

We could definitely try to keep the csv formatted data here, but the raw files aren't too necessary to keep here.

I agree—this is probably the best approach, at least for the raw financial data. I'm only hosting the election data here because it's so small (~50MB for all of it, and it's only that big because macOS Preview does strange things to PDF files).

If people want to verify the documents, they are probably not going to be going through a github repository manually comparing files. They'll probably clone it and run our scripts themselves.

Yep. And why pay for extra hosting if taxpayers are already paying for it via FEC.gov? :)

pjrule commented 6 years ago

Not sure if you've already thought of this, but it might be nice to use GNU Make for the fetch/build process.

InnovativeInventor commented 6 years ago

Interesting idea. I'm not too familiar with make, but that sounds like a good idea to keep our repository sizes down.

pjrule commented 6 years ago

TBH, I'm not too familiar with the exact syntax either, but I've seen people do interesting things with it beyond its original usage as a build tool for big C(++) projects, like front-end web dev compilation or LaTeX compilation with automated plot generation. It's generic enough that it can basically do anything.