get_schedule_a stops after 30 pages

EdgDew commented 4 years ago

Running:

data_all <- get_schedule_a( api_key = Sys.getenv("DATAGOV_API_KEY"), data_structure = "tidy", sort_null_only = TRUE, sort_hide_null = FALSE, sort = "contribution_receipt_date",, is_individual = "t", two_year_transaction_period = "2010")

Gives:

Itemized contributions found: 8874626 There are about 88747 pages of results to get containing approximately 8874626 itemized contributions. On page 2/88747 On page 10/88747 On page 20/88747 On page 30/88747

Systematically stopping at page 30 whatever the number of itemized contributions (even if changing the year period for instance).

Is it some kind of built-in limit ? I have an upgraded FEC API key.

Many thanks ! E

EdgDew commented 4 years ago

Up!

stephenholzman commented 4 years ago

Apologies for the long wait!

After investigating, I'm thinking this is might be a bug somewhere on the FEC's end in their indexing for very large results of a query. Changing the year of your example returns between 20-40 pages and then stops, so it's not some hard limit that they implemented. My function works with queries past this limit going up to hundreds of pages in some cases, but going through tens of thousands of API calls is to complete a query is definitely pushing it from my limited understanding of the backend tech.

If the data returned from this query is necessary for your work, it sounds like it might be better to investigate FEC bulk download offerings in any case to be kind to their servers.

Hope this helps point you in the right direction!

EdgDew commented 4 years ago

Thank you for having looked thoroughly at the issue! Bulk files are not suited for my analysis, hence my use of the API - but thanks for suggesting. I'll try to segment my query or find another strategy.

stephenholzman / tidyusafec

get_schedule_a stops after 30 pages #30