Open stephenholzman opened 6 years ago
Researching the landscape of other wrappers intending to return tidy results of gov data. There's a variety of different approaches and styles. Lots of influence from the API they're working with.
If this is going to work, and partly to distinguish the efforts here, the goal is to make things as consistent as possible for users. Always worried about an xkcd-927 type situation.
Exploring justification, ideally all data providers would adopt a consistent approach to their APIs or data portals. Totally unrealistic. The next best thing would be consistent wrappers. Wrappers are easier to implement and do not require organizations to agree on a standard API style. Wrappers are nimble. The next decade will undoubtedly see upheaval as orgs modernize, so perhaps wrappers are best suited to quickly bring about a more harmonious experience working with data from different sources.
In parallel while working on tidyusafec, I think developing a well defined wrapper style guide would be beneficial. Definitely a bit ambitious, but let folks buy in naturally if this is truly the best approach (it would help if I actually succeed at writing software and demonstrate the potential utility).
If source APIs are targeted at a wide range of developers wanting mostly hierarchical data, wrappers should be meant to give similar scaling ability to analysts who want mostly relational data. My working proposals for a wrapper standard:
Going to focus on development for a while, will revisit.
As the FEC uses the data.gov network API key, it might make sense to have a place to air thoughts on a hypothetical organization structure for a larger ecosystem of data access packages. As R and the tidy approach are international, can lump in thoughts about namespace here too.
Motivation for this is every class I ever had about about navigating government data portals instead of going into documentation and limitations. This will not stand man.
If tidygov were to come into existence, it should be international. tidy[country-abbreviation][org-abbreviation] should be the naming convention. So this would be 'tidyusafec'. tidycensus might become 'tidyusacensus'. cancensus might become 'tidycancensus'.
The burden of navigating between packages in the hypothetical tidygov universe should be minimized. The problems we want to solve are aligned with the general tidyverse: the difficulty of retrieving data is too high, the difficulty of wrangling data is too high, and the difficulty of replicating analysis strategies across time/geography/topics is too high.
This is all way out of scope for tidyfec, except for maybe renaming it tidyusafec. Just the best place to record thoughts for now.