stephenholzman / tidyusafec

An R wrapper for the OpenFEC API that features tidy cleaning.
https://stephenholzman.github.io/tidyusafec/
Other
7 stars 0 forks source link

how tidy are we really, tolerance for list columns #3

Open stephenholzman opened 6 years ago

stephenholzman commented 6 years ago

In search_candidates, we get back lists for election_years, cycles, districts, and principal_committees. Principal committees is itself a list. Currently setup to return one row for each candidate + principal committee combo as our tidy unit, but each committee was not necessarily active for every election year associated with the candidate.

  1. For candidates with multiple committees, can we determine which election they were primarily used in? Can check the filing dates and write some logic to infer if they overlap. Don't think I'd want to get rid of the election_years list column, but maybe a new mutated primary_election variable?

  2. Same thing as above but with districts.

  3. Cycles is probably something safe to leave in list column, but we're just starting so who knows.

stephenholzman commented 6 years ago

I think it's best to be tidy for anything financial, list columns ok for searches. Need to be thinking about language in this case.

stephenholzman commented 6 years ago

After thinking about how to flag, I think the best thing to do is to create a variable containing vectors to filter on. It would also be nice to pipe in results of a search into the next query. Something like:

tidyfec_filters<- list( top_level = c("receipts", "disbursements", "cash"), receipts_subtotals = c("contributions", "loans", "transfers", "other"), ... )

search_candidates(api_key = api_key, state = "VA", district = "10", office = "H", election_year = "2018", data_structure = "tidy") %>% get_candidate_totals(api_key = api_key, candidate_ids = candidate_id, data_structure = "tidy") %>% filter(type_of_funds %in% tidyfec_filters$top_level, cycle == "2018") %>% ggplot() + [...plot code...]

stephenholzman commented 6 years ago

Even cleaner aspirations.

search_candidates(state = "VA", district = "10", office = "H", election_year = "2018") %>% get_candidate_totals() %>% filter(type_of_funds %in% tidyfec_filters$top_level, cycle == "2018") %>% ggplot() + [...plot code...]

stephenholzman commented 6 years ago

I think moving to full tidy sooner rather than later is a good move. Cycles is the easiest to implement. We can unnest them, then filter on candidate_cycles == committee_cycles. Need to figure out an approach to election_years and election_districts, plus if we'll automatically filter on parameters supplied or return everything. Getting tired, more research tomorrow.