Closed grgcombs closed 12 years ago
re: standardizing key names, absolutely we can and should start doing this - I think I have someone well suited to the task that I can put on it in the next week or two
there's an item on my TODO list about detailed item-specific validation/cleanup hooks, I think that phone numbers, etc. would be perfect for this so perhaps it is time to focus on getting that working as well
Maybe in V2, we could construct and populate some standard office/contact dictionaries ...
{
"id" : "NYL000002",
"leg_id" : "NYL000002",
"state" : "ny",
"chamber" : "upper",
"party" : "Democratic",
"district" : "15",
"full_name" : "Joseph P Addabbo Jr.",
"photo_url" : "http://www.nysenate.gov/files/imagecache/senator_teaser/profile-pictures/Addabbo.SD15.jpg",
"updated_at" : "2011-09-06 06:24:39",
"created_at" : "2011-05-05 01:29:13",
"emails" : [
"addobbo@nysenate.gov",
"info@voteaddobbo.com"
],
"websites" : [
"http://www.nysenate.gov/Addobbo",
"http://www.voteaddobbo.com/"
],
"offices" : [
{
"office_id" : "NYO000002",
"leg_id" : "NYL000002",
"type" : "capitol",
"phone" : "(404) 656-0202",
"fax" : "(404) 656-0203",
"address" : "111 Broadway\n New York, New York 10046",
"coordinates" : [-32.123123,101.2212] // tee hee!
},
{
"office_id" : "NYO000003",
"leg_id" : "NYL000002",
"type" : "district",
"phone" : "(404) 555-1212",
"fax" : nil,
"address" : "88 Main St\n Syracuse, New York 10011",
"coordinates" : [-7.320, 72.444] // tee hee!
}
]
}
On a related note, I've thrown together some preliminary thoughts on a V2 API via the Wiki
a new offices key has been added, it is possible to add offices in the scrape and they look like
{
'type': 'capitol', // will be capitol|district
'name': 'Capitol Office',
'fax': null,
'phone': '202-555-0001',
'address': '212 Maple Lane\nRaleigh, NC 27526',
'email': null
}
need to add this in more states (LA was the experimental one for this)
Trying to eek a little more usable info into the app and I'm running into an issue that certainly is a direct result of scraping from various state sites. Nevertheless, we need more uniformity for contact info in the legislator api.
Some examples of inconsistency in the keys that prove troublesome:
+website
,website
+email_address
andemail
+phone
,office_phone
,+business_phone
,+capitol_phone
,+district_phone
,+phone_number
+capital_address
,+capitol_address
,+district_address
,+office_loc
,+address
+office_fax
,+fax_number
Here's a snapshot of these examples:
Understandably, changing these keys could cause problems for folks that are expecting the old way, however, the current situation precludes us from including these data values in the app, at least in any meaningful way. If we can post-process the scraping to clean up the dictionary keys, I would propose something like the following:
... And so on ... basically we fill the bin of standard keys (capitol_phone, website, email, etc) and then leftovers go into appendages, like "+KEY_other", "+KEY_other2", etc. If we're already mucking about in there, would it be possible to throw some cleanup on the values too? For instance, regex the phone numbers to grab our 10 digits and rewrite into a consistent format ..., also potentially cleaning up street addresses by stripping white space, dealing with phone numbers in the address values, capitalization, etc. Those are all positives, but the most important to me is a standardization on the dictionary keys.