propublica / Capitol-Words

Scraping, parsing and indexing the daily Congressional Record to support phrase search over time, and by legislator and date
BSD 3-Clause "New" or "Revised" License
122 stars 34 forks source link

urls w/ simple bioguide id are broken #77

Open timball opened 10 years ago

timball commented 10 years ago

bioguideIDs used to have urls like this:

http://capitolwords.org/lawmaker/C001057/

but now have urls like this

http://capitolwords.org/legislator/C001057-norm-coleman/

i do not know of a simple regex-y way to fix this . GOOG wmt thinks there are 133 broken urls like this . uh this might be a thinker . the problem is that there are outside links that point to these links .

--timball

timball commented 10 years ago

temporarily i'm gonna make a regex that will leave broken links but just s/lawmaker/legislator/

drinks commented 10 years ago

These are old site references.. Does it even matter?

— Dan Drinkard

On Sat, Mar 8, 2014 at 1:18 PM, timball notifications@github.com wrote:

bioguideIDs used to have urls like this:

http://capitolwords.org/lawmaker/C001057/ but now have urls like this http://capitolwords.org/legislator/C001057-norm-coleman/ i do not know of a simple regex-y way to fix this . GOOG wmt thinks there are 133 broken urls like this . uh this might be a thinker . the problem is that there are outside links that point to these links .

--timball

Reply to this email directly or view it on GitHub: https://github.com/sunlightlabs/Capitol-Words/issues/77

timball commented 10 years ago

i don't know ? i'm just going thru all the bugs and issues webmaster tools has for all of our sites .

the regex is in place:

    location ~ ^/lawmaker/ {
        rewrite ^/lawmaker(\/.*) /legislator$1 permanent;
    }

--timball