Open jakobkolb opened 8 years ago
2 things to fix in the notebook :
url_template
: url_template = "http://climate.weather.gc.ca/climate_data/bulk_data_e.html?stationID=5415&Year={year}&Month={month}&format=csv&timeframe=1&submit=%20Download+Data"
weather_mar2012 = pd.read_csv("data/eng-hourly-03012012-03312012.csv", skiprows=15, index_col='Date/Time', parse_dates=True, encoding='latin1')
, remove header=True
Sent a PR.
also, the encoding='latin1'
should go (at least on Python3)
Hello. I tried with the updated codes. But I got an error stating as follows: File b'data/eng-hourly-03012012-03312012.csv' does not exist
Please kindly help me thanks!
At this point (July 2018), the following works in Python3:
In[]: url_template = "http://climate.weather.gc.ca/climate_data/bulk_data_e.html?stationID=5415&Year={year}&Month={month}&format=csv&timeframe=1&submit=%20Download+Data"
and:
In[]: url = url_template.format(month=3, year=2012)
weather_mar2012 = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, encoding='utf-8', header=0)
An important change, apart from the URL itself, is that header
accepts an integer (row number) instead of a boolean.
Because of the encoding change, we need to change this, as well:
In[]: weather_mar2012[u"Temp (°C)"].plot(figsize=(15, 5))
Also, the “Data Quality” column disappeared. This requires tweaks while working with columns.
In[]: weather_mar2012.columns = [ u'Year', u'Month', u'Day', u'Time', u'Temp (C)', u'Temp Flag', u'Dew Point Temp (C)', u'Dew Point Temp Flag', u'Rel Hum (%)', u'Rel Hum Flag', u'Wind Dir (10s deg)', u'Wind Dir Flag', u'Wind Spd (km/h)', u'Wind Spd Flag', u'Visibility (km)', u'Visibility Flag', u'Stn Press (kPa)', u'Stn Press Flag', u'Hmdx', u'Hmdx Flag', u'Wind Chill', u'Wind Chill Flag', u'Weather']
In[]: weather_mar2012 = weather_mar2012.drop(['Year', 'Month', 'Day', 'Time'], axis=1)
In[]:
def download_weather_month(year, month):
if month == 1:
year += 1
url = url_template.format(year=year, month=month)
weather_data = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, header=0)
weather_data = weather_data.dropna(axis=1)
weather_data.columns = [col.replace('\xb0', '') for col in weather_data.columns]
weather_data = weather_data.drop(['Year', 'Day', 'Month', 'Time'], axis=1)
return weather_data
At this point (July 2018), the following works in Python3: In[]:
url_template = "http://climate.weather.gc.ca/climate_data/bulk_data_e.html?stationID=5415&Year={year}&Month={month}&format=csv&timeframe=1&submit=%20Download+Data"
and: In[]:
url = url_template.format(month=3, year=2012)
weather_mar2012 = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, encoding='utf-8', header=0)
An important change, apart from the URL itself, is that
header
accepts an integer (row number) instead of a boolean.Because of the encoding change, we need to change this, as well: In[]:
weather_mar2012[u"Temp (°C)"].plot(figsize=(15, 5))
Also, the “Data Quality” column disappeared. This requires tweaks while working with columns.
In[]:
weather_mar2012.columns = [ u'Year', u'Month', u'Day', u'Time', u'Temp (C)', u'Temp Flag', u'Dew Point Temp (C)', u'Dew Point Temp Flag', u'Rel Hum (%)', u'Rel Hum Flag', u'Wind Dir (10s deg)', u'Wind Dir Flag', u'Wind Spd (km/h)', u'Wind Spd Flag', u'Visibility (km)', u'Visibility Flag', u'Stn Press (kPa)', u'Stn Press Flag', u'Hmdx', u'Hmdx Flag', u'Wind Chill', u'Wind Chill Flag', u'Weather']
In[]:weather_mar2012 = weather_mar2012.drop(['Year', 'Month', 'Day', 'Time'], axis=1)
In[]:
def download_weather_month(year, month): if month == 1: year += 1 url = url_template.format(year=year, month=month) weather_data = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, header=0) weather_data = weather_data.dropna(axis=1) weather_data.columns = [col.replace('\xb0', '') for col in weather_data.columns] weather_data = weather_data.drop(['Year', 'Day', 'Month', 'Time'], axis=1) return weather_data
When using the url template and the weather data to compare the temperatures with bikes data, code seems to be not working. I modified url template and made the changes required in later parts, and everything is running well. But when I tried to output first three rows of the data, its showing nothing.
Here's the code :
`
def get_weather_data(year):
url_template = "http://climate.weather.gc.ca/climate_data/bulk_data_e.html?stationID=5415&Year={year}&Month={month}&format=csv&timeframe=1&submit=%20Download+Data"
# airport station is 5415, hence that was used
data_by_month = []
for month in range(1,13):
url = url_template.format(year=year, month=month)
weather_data = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, encoding='utf-8', header=0)
weather_data.columns = map(lambda x: x.replace('\xb0', ''), weather_data.columns)
# xbo is degree symbol
weather_data = weather_data.drop(['Year', 'Day', 'Month', 'Time'], axis=1)
data_by_month.append(weather_data.dropna())
return pd.concat(data_by_month).dropna(axis=1, how='all').dropna()
weather_data = get_weather_data(2012)
weather_data[:5]
`
url_template = "http://climate.weather.gc.ca/climate_data/bulk_data_e.html?stationID=5415&Year={year}&Month={month}&format=csv&timeframe=1&submit=%20Download+Data"
# url_template = 'https://raw.githubusercontent.com/kbridge/weather-data/main/weather_data_{year}_{month}.csv'
url = url_template.format(month=3, year=2012)
weather_mar2012 = pd.read_csv(url, index_col='Date/Time (LST)', parse_dates=True, encoding='utf-8-sig')
Summary:
url_template
is the same as @GillesMoyse posted. That url is too slow to load. You can change it to my mirror at gist.header=True
is removed.skiprows=15
is removed because there is no metadata before the CSV data anymore.index_col
is changed from 'Date/Time'
to 'Date/Time (LST)'
.encoding
is changed from 'latin1'
to 'utf-8-sig'
. We need to use the -sig
variant to skip the UTF-8 BOM; otherwise, the first column will contain weird characters 
.Before renaming the columns to eliminate °
characters, drop some unexpected new columns first:
weather_mar2012 = weather_mar2012.drop(['Longitude (x)', 'Latitude (y)', 'Station Name', 'Climate ID', 'Precip. Amount (mm)', 'Precip. Amount Flag'], axis=1)
And the renaming code becomes
weather_mar2012.columns = [
u'Year', u'Month', u'Day', u'Time', u'Temp (C)',
u'Temp Flag', u'Dew Point Temp (C)', u'Dew Point Temp Flag',
u'Rel Hum (%)', u'Rel Hum Flag', u'Wind Dir (10s deg)', u'Wind Dir Flag',
u'Wind Spd (km/h)', u'Wind Spd Flag', u'Visibility (km)', u'Visibility Flag',
u'Stn Press (kPa)', u'Stn Press Flag', u'Hmdx', u'Hmdx Flag', u'Wind Chill',
u'Wind Chill Flag', u'Weather']
Column Data Quality
is removed because the new data doesn't contain the column anymore.
This also renames the column Time (LST)
to Time
.
No need to drop the column Data Quality
anymore:
-weather_mar2012 = weather_mar2012.drop(['Year', 'Month', 'Day', 'Time', 'Data Quality'], axis=1)
+weather_mar2012 = weather_mar2012.drop(['Year', 'Month', 'Day', 'Time'], axis=1)
temperatures.head
is a method now, so you should
-print(temperatures.head)
+print(temperatures.head())
Change download_weather_month
to this:
# mirror
# url_template = 'https://raw.githubusercontent.com/kbridge/weather-data/main/weather_data_{year}_{month}.csv'
def download_weather_month(year, month):
url = url_template.format(year=year, month=month)
weather_data = pd.read_csv(url, index_col='Date/Time (LST)', parse_dates=True, encoding='utf-8-sig')
weather_data = weather_data.dropna(axis=1)
weather_data.columns = [col.replace('\xb0', '') for col in weather_data.columns]
weather_data = weather_data.drop([
'Year',
'Day',
'Month',
'Time (LST)',
'Longitude (x)',
'Latitude (y)',
'Station Name',
'Climate ID',
], axis=1)
return weather_data
which was
def download_weather_month(year, month):
if month == 1:
year += 1
url = url_template.format(year=year, month=month)
weather_data = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, header=True)
weather_data = weather_data.dropna(axis=1)
weather_data.columns = [col.replace('\xb0', '') for col in weather_data.columns]
weather_data = weather_data.drop(['Year', 'Day', 'Month', 'Time', 'Data Quality'], axis=1)
return weather_data
Sorry I have used this issue as if it is my own memo. But I will be glad if my comments help you.
Apparently, the site for Canadian historical weather data changed their site.