jakobkolb commented 8 years ago

Apparently, the site for Canadian historical weather data changed their site.

GillesMoyse commented 8 years ago

2 things to fix in the notebook :

new version for url_template : url_template = "http://climate.weather.gc.ca/climate_data/bulk_data_e.html?stationID=5415&Year={year}&Month={month}&format=csv&timeframe=1&submit=%20Download+Data"
in weather_mar2012 = pd.read_csv("data/eng-hourly-03012012-03312012.csv", skiprows=15, index_col='Date/Time', parse_dates=True, encoding='latin1'), remove header=True

Sent a PR.

andreas-h commented 8 years ago

also, the encoding='latin1' should go (at least on Python3)

hsuanie commented 7 years ago

Hello. I tried with the updated codes. But I got an error stating as follows: File b'data/eng-hourly-03012012-03312012.csv' does not exist

Please kindly help me thanks!

Enkerli commented 6 years ago

At this point (July 2018), the following works in Python3: In[]: url_template = "http://climate.weather.gc.ca/climate_data/bulk_data_e.html?stationID=5415&Year={year}&Month={month}&format=csv&timeframe=1&submit=%20Download+Data"

and: In[]: url = url_template.format(month=3, year=2012) weather_mar2012 = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, encoding='utf-8', header=0)

An important change, apart from the URL itself, is that header accepts an integer (row number) instead of a boolean.

Because of the encoding change, we need to change this, as well: In[]: weather_mar2012[u"Temp (°C)"].plot(figsize=(15, 5))

Also, the “Data Quality” column disappeared. This requires tweaks while working with columns.

In[]: weather_mar2012.columns = [ u'Year', u'Month', u'Day', u'Time', u'Temp (C)', u'Temp Flag', u'Dew Point Temp (C)', u'Dew Point Temp Flag', u'Rel Hum (%)', u'Rel Hum Flag', u'Wind Dir (10s deg)', u'Wind Dir Flag', u'Wind Spd (km/h)', u'Wind Spd Flag', u'Visibility (km)', u'Visibility Flag', u'Stn Press (kPa)', u'Stn Press Flag', u'Hmdx', u'Hmdx Flag', u'Wind Chill', u'Wind Chill Flag', u'Weather'] In[]: weather_mar2012 = weather_mar2012.drop(['Year', 'Month', 'Day', 'Time'], axis=1)

In[]:

def download_weather_month(year, month):
    if month == 1:
        year += 1
    url = url_template.format(year=year, month=month)
    weather_data = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, header=0)
    weather_data = weather_data.dropna(axis=1)
    weather_data.columns = [col.replace('\xb0', '') for col in weather_data.columns]
    weather_data = weather_data.drop(['Year', 'Day', 'Month', 'Time'], axis=1)
    return weather_data

mvresh commented 5 years ago

At this point (July 2018), the following works in Python3: In[]: url_template = "http://climate.weather.gc.ca/climate_data/bulk_data_e.html?stationID=5415&Year={year}&Month={month}&format=csv&timeframe=1&submit=%20Download+Data"

and: In[]: url = url_template.format(month=3, year=2012) weather_mar2012 = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, encoding='utf-8', header=0)

An important change, apart from the URL itself, is that header accepts an integer (row number) instead of a boolean.

Because of the encoding change, we need to change this, as well: In[]: weather_mar2012[u"Temp (°C)"].plot(figsize=(15, 5))

Also, the “Data Quality” column disappeared. This requires tweaks while working with columns.

In[]: weather_mar2012.columns = [ u'Year', u'Month', u'Day', u'Time', u'Temp (C)', u'Temp Flag', u'Dew Point Temp (C)', u'Dew Point Temp Flag', u'Rel Hum (%)', u'Rel Hum Flag', u'Wind Dir (10s deg)', u'Wind Dir Flag', u'Wind Spd (km/h)', u'Wind Spd Flag', u'Visibility (km)', u'Visibility Flag', u'Stn Press (kPa)', u'Stn Press Flag', u'Hmdx', u'Hmdx Flag', u'Wind Chill', u'Wind Chill Flag', u'Weather'] In[]: weather_mar2012 = weather_mar2012.drop(['Year', 'Month', 'Day', 'Time'], axis=1)

In[]:
def download_weather_month(year, month):
    if month == 1:
        year += 1
    url = url_template.format(year=year, month=month)
    weather_data = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, header=0)
    weather_data = weather_data.dropna(axis=1)
    weather_data.columns = [col.replace('\xb0', '') for col in weather_data.columns]
    weather_data = weather_data.drop(['Year', 'Day', 'Month', 'Time'], axis=1)
    return weather_data
When using the url template and the weather data to compare the temperatures with bikes data, code seems to be not working. I modified url template and made the changes required in later parts, and everything is running well. But when I tried to output first three rows of the data, its showing nothing.

mvresh commented 5 years ago

Here's the code :

`

getting weather data to look at temps

 def get_weather_data(year):
      url_template = "http://climate.weather.gc.ca/climate_data/bulk_data_e.html?stationID=5415&Year={year}&Month={month}&format=csv&timeframe=1&submit=%20Download+Data"

  # airport station is 5415, hence that was used

  data_by_month = []

  for month in range(1,13):

    url = url_template.format(year=year, month=month)
    weather_data = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, encoding='utf-8', header=0)
    weather_data.columns = map(lambda x: x.replace('\xb0', ''), weather_data.columns)

    # xbo is degree symbol

    weather_data = weather_data.drop(['Year', 'Day', 'Month', 'Time'], axis=1)
    data_by_month.append(weather_data.dropna())

  return pd.concat(data_by_month).dropna(axis=1, how='all').dropna()

weather_data = get_weather_data(2012)

weather_data[:5]

`

kbridge commented 2 years ago

url_template = "http://climate.weather.gc.ca/climate_data/bulk_data_e.html?stationID=5415&Year={year}&Month={month}&format=csv&timeframe=1&submit=%20Download+Data"
# url_template = 'https://raw.githubusercontent.com/kbridge/weather-data/main/weather_data_{year}_{month}.csv'
url = url_template.format(month=3, year=2012)
weather_mar2012 = pd.read_csv(url, index_col='Date/Time (LST)', parse_dates=True, encoding='utf-8-sig')

Summary:

url_template is the same as @GillesMoyse posted. That url is too slow to load. You can change it to my mirror at gist.
header=True is removed.
skiprows=15 is removed because there is no metadata before the CSV data anymore.
index_col is changed from 'Date/Time' to 'Date/Time (LST)'.
encoding is changed from 'latin1' to 'utf-8-sig'. We need to use the -sig variant to skip the UTF-8 BOM; otherwise, the first column will contain weird characters ï»¿.

kbridge commented 2 years ago

Before renaming the columns to eliminate ° characters, drop some unexpected new columns first:

weather_mar2012 = weather_mar2012.drop(['Longitude (x)', 'Latitude (y)', 'Station Name', 'Climate ID', 'Precip. Amount (mm)', 'Precip. Amount Flag'], axis=1)

And the renaming code becomes

weather_mar2012.columns = [
    u'Year', u'Month', u'Day', u'Time', u'Temp (C)', 
    u'Temp Flag', u'Dew Point Temp (C)', u'Dew Point Temp Flag', 
    u'Rel Hum (%)', u'Rel Hum Flag', u'Wind Dir (10s deg)', u'Wind Dir Flag', 
    u'Wind Spd (km/h)', u'Wind Spd Flag', u'Visibility (km)', u'Visibility Flag',
    u'Stn Press (kPa)', u'Stn Press Flag', u'Hmdx', u'Hmdx Flag', u'Wind Chill', 
    u'Wind Chill Flag', u'Weather']

Column Data Quality is removed because the new data doesn't contain the column anymore.

This also renames the column Time (LST) to Time.

kbridge commented 2 years ago

No need to drop the column Data Quality anymore:

-weather_mar2012 = weather_mar2012.drop(['Year', 'Month', 'Day', 'Time', 'Data Quality'], axis=1)
+weather_mar2012 = weather_mar2012.drop(['Year', 'Month', 'Day', 'Time'], axis=1)

kbridge commented 2 years ago

temperatures.head is a method now, so you should

-print(temperatures.head)
+print(temperatures.head())

kbridge commented 2 years ago

Change download_weather_month to this:

# mirror
# url_template = 'https://raw.githubusercontent.com/kbridge/weather-data/main/weather_data_{year}_{month}.csv'

def download_weather_month(year, month):
    url = url_template.format(year=year, month=month)
    weather_data = pd.read_csv(url, index_col='Date/Time (LST)', parse_dates=True, encoding='utf-8-sig')
    weather_data = weather_data.dropna(axis=1)
    weather_data.columns = [col.replace('\xb0', '') for col in weather_data.columns]
    weather_data = weather_data.drop([
        'Year',
        'Day',
        'Month',
        'Time (LST)',
        'Longitude (x)',
        'Latitude (y)',
        'Station Name',
        'Climate ID',
    ], axis=1)
    return weather_data

which was

def download_weather_month(year, month):
    if month == 1:
        year += 1
    url = url_template.format(year=year, month=month)
    weather_data = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, header=True)
    weather_data = weather_data.dropna(axis=1)
    weather_data.columns = [col.replace('\xb0', '') for col in weather_data.columns]
    weather_data = weather_data.drop(['Year', 'Day', 'Month', 'Time', 'Data Quality'], axis=1)
    return weather_data

kbridge commented 2 years ago

Sorry I have used this issue as if it is my own memo. But I will be glad if my comments help you.

jvns / pandas-cookbook

Chapter 5 - the url template is outdated leading to 404: Not Found #50

getting weather data to look at temps