unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
107 stars 21 forks source link

Hardcode encodings from <meta charset="..."> tags #292

Closed divergentdave closed 8 years ago

divergentdave commented 8 years ago

Following up on previous mojibake issues, this hardcodes character encodings for all remaining sites that set the encoding in a <meta> tag, and not in the Content-Type header. See below for the script I used to gather everything.

#!/bin/sh
set -e
jq -r '.[] | .homepage_url | select(. != null)' ~/oversight.garden/config/inspectors.json | while IFS= read -r line
do
  echo $line
  curl -ikL $line | (grep -i charset= || true)
  echo
done
konklone commented 8 years ago

:+1: That is awesome.

Want to inline or link to that script from the code, too? It'll be more portable that way.

divergentdave commented 8 years ago

:+1:, done in 8ab8903