unitedstates / congress-legislators

Members of the United States Congress, 1789-Present, in YAML/JSON/CSV, as well as committees, presidents, and vice presidents.
Creative Commons Zero v1.0 Universal
2.05k stars 506 forks source link

Social media urls to add to your legislators data set #293

Open arderyp opened 9 years ago

arderyp commented 9 years ago

I work with the Library of Congress, and I am using your 'legislators-current' and 'legislators-social-media' yaml files to facilitate scoping for our US Congress web archive collection. Thanks for the data!

As it turns out, you had a good chunk of social media sites that we were missing. However, we also seem to have a descent amount of data that you are missing. I have created the following json file outlining the social media accounts that we have found, but are missing from your data set, per Bioguide ID. @dwillis informed me that some of it may not be pertinent to the list you keep, but I thought I might as well pass it along:

https://gist.github.com/pardery/84fc56c836c4b1f02708

Thanks, Phil

gmj2053 commented 9 years ago

Re: "official". We only scope in sites that are linked from the congressional website as part of their social media framework. Where does that come from for your set? (Phil and I work together)

konklone commented 9 years ago

So if these are all linked to from their congressional websites, that's often a strong enough signal to accept those handles, but review is needed.

There are a few that we exclude anyway, when they're not actually for the member, like: http://www.flickr.com/photos/republicanconference/ or http://www.youtube.com/HouseConference. Those are often used as placeholders until the MoC gets their own account.

Also, not all of the Twitter accounts are legit. http://twitter.com/SenRandPaul is listed, but that account doesn't exist. http://twitter.com/alangrayson is also listed, but that doesn't look legit, and http://grayson.house.gov doesn't link to it (or to any Twitter account). https://twitter.com/sethmoulton works, and is linked to by his official site, but the Twitter bio links to Moulton's campaign site, which is a mistake on someone's part and potentially opens up Moulton to charges that he's using official resources to benefit his political campaign. I'd at least want to call Moulton's office to confirm that the account is meant to be his official legislative account.

We also haven't (yet) been tracking Flickr accounts. It looks like you have 304 listed in that gist (not counting the /republicanconference/ account). That's more widespread than I thought -- perhaps the project should consider tracking Flickr accounts too.

We don't track G+ accounts, and there are only 33 plus.google.com URLs in that gist -- plus G+ is a wasteland -- so that doesn't seem worth tracking. We're also not likely to incorporate a Picasa account (e.g. http://picasaweb.google.com/congresswomanpingree/)

In any case, it looks like there are still some accounts we're missing in your gist, and I'd like to review those. Could you exclude the G+, Picasa, Flickr, and placeholder accounts, and post a link to the updated list?

gmj2053 commented 9 years ago

Our data definitely needs cleanup. Looks like the Sen Paul (2014) and Alan Grayson (2010) had a link to twitter from their congressional sites. One issue is our data is cumulative and we do have data from other older sources (govtrack/wikipedia) so it would be great to have a reliable source that when we make these archives public and expose this scoping, that we have good data.

@pardery can make the data look anyway you want. It could be we look at flagging our data for the accounts you care about (youtube, twitter, facebook, ?) and clean up bad/old data.

We really appreciate this work.

arderyp commented 9 years ago

Thanks for the input @konklone. I moved the original list here and the new list excluding picassa, google plus, and flickr is in place here

konklone commented 9 years ago

Thanks, @pardery! Sorry to leave you hanging on this -- I can review a bit later this week, but if anyone else on the project wants to jump on it before then, by all means.

arderyp commented 9 years ago

Thanks!

konklone commented 9 years ago

OK, finally got around to this. Some updates below, more to come as I keep working. I have a social-media-updates branch with work in progress incorporating these and a new automated sweep.

I removed the LinkedIn, Pinterest, and Tumblr accounts from the provided file. I removed any accounts that pointed to leadership/conference accounts.

Twitter:

konklone commented 9 years ago

Oh, missed some other Twitter submissions from the list:

Facebook:

konklone commented 9 years ago

I filed #299 with my work so far. I posted https://gist.github.com/konklone/d531370dadd55d31eb0c, which has the remaining instagram and youtube accounts you submitted.

konklone commented 9 years ago

Seth Moulton's team resolved the bad link in favor of the TeamMoulton account: https://twitter.com/sethmoulton/status/620603762751733760

gmj2053 commented 9 years ago

I'm curious about how websites are identified for this project.

We typically use the domain and not the webserver redirect since that may change during a congress. I am seeing websites such as https://crenshaw.house.gov/index.cfm/home listed now, vice http://crenshaw.house.gov/

Keep up the good work! we are using the project data and value it highly.

konklone commented 9 years ago

We typically use the domain and not the webserver redirect since that may change during a congress. I am seeing websites such as https://crenshaw.house.gov/index.cfm/home listed now, vice http://crenshaw.house.gov/

I share your opinion that we should be using the domain, not the specific homepage URL, as those definitely do change. The House and Senate have both stabilized on a subdomain pattern in recent years, too. I thought we were using the domain now.

I did some checking of our current legislators file (by hand) and it looks like that for legislator's current term, we mostly are. Crenshaw is an exception, and we should look at our script again to see why that's the case. Can we move it to another ticket?

gmj2053 commented 9 years ago

Was that a question for me, I can move it. I put it on this ticket because I wasn't sure if wasn't just one of our peculiarities. There are about three on the list.

konklone commented 9 years ago

Got it, I've opened #300 to reflect that work.

soooh commented 7 years ago

I was working on a project using legislators' social media accounts and pulled some of the data in this repo. I found some new Twitter accounts for both new and returning MoCs, as well as other accounts that needed cleaning up. I looked through the guidelines for what you consider "official," so I've included information on whether these accounts are linked on their websites so you can add/edit handles at your discretion.

Twitter handles to add to repo:

id twitter account linked on website
R000608 repjackyrosen no
C001110 reploucorrea no
B001298 repdonbacon no
S001199 RepSmucker no
M001198 RepMarshall no
S001202 SenatorStrange no website
R000609 RepRutherfordFL yes
J000299 repmikejohnson yes
E000296 RepDwightEvans yes

Twitter handles to clean up in repo:

id twitter account linked on website
B000574 repblumenauer no
G000535 RepGutierrez yes
T000462 pattiberi yes
Y000064 SenToddYoung yes
C001088 ChrisCoons yes
joelcollinsdc commented 7 years ago

in #434, I added all the linked ones and the below ones as well.

id twitter account linked on website
R000608 repjackyrosen looks offical
C001110 reploucorrea looks official
B001298 repdonbacon looks official
S001199 RepSmucker looks official
M001198 RepMarshall looks official
S001202 SenatorStrange looks official

leaving this one out since it looks strange..

id twitter account linked on website
B000574 repblumenauer looks offical... but links to campaign site? holding off on this one