isdsucph / isds2021

Introduction to Social Data Science 2021 - a summer school course https://isdsucph.github.io/isds2021/
MIT License
22 stars 37 forks source link

Ex. 2.3.4 (Exercise 2) #29

Closed AkselSB closed 3 years ago

AkselSB commented 3 years ago

Hi, I'm having issues updating the annotated function in 2.1.1 with the functions processing "area" and "temporal data."

The following code runs with no errors, but the new functions are not included: `# INCLUDED IN ASSIGNMENT 1# YOUR CODE HERE import pandas as pd

importing the tool

def load_weather(year):

url = f"ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/by_year/{year}.csv.gz" 

df_weather = pd.read_csv(url, 
                         header=None)  
df_weather = df_weather.iloc[:,:4] 

column_names = ['station', 'datetime', 'obs_type', 'obs_value'] 
df_weather.columns = column_names

df_weather['obs_value'] = df_weather['obs_value'] / 10 

selection_tmax = df_weather.obs_type == 'TMAX'
df_select = df_weather.loc[selection_tmax] 

df_weather['Area Code'] = df_weather['station'].str[:3] #<<<Area function

df_weather['datetime'] = pd.to_datetime(df_weather['datetime'], format='%Y%m%d') #<<<datetime function
df_weather = df_weather.rename(columns={'datetime':'datetime_dt'})

df_sorted = df_select.sort_values(by=['station', 'datetime']) 

df_reset = df_sorted.reset_index(drop=True)     
df_out = df_reset.copy()

return df_out

load_weather(1863)`

MatPiq commented 3 years ago

Hi Aksel! You are using a lot of variable names, somewhere in your code this probably leads to the changes to the df not being assigned to the variable you intended. This is actually a good example of one of the benefits with the method-chaining approach :) /Matias

AkselSB commented 3 years ago

I've tried to chain the functions previously, but cannot figure out where the new functions would fit in when using this approach:

`def load_weather(year):

url = f"ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/by_year/{year}.csv.gz" 

df_weather = pd.read_csv(url, 
                         header=None)  #the function uses pandas to read the csv.file      

df_weather = df_weather.iloc[:,:4] 

column_names = ['station', 'datetime', 'obs_type', 'obs_value'] 
df_weather.columns = column_names 

df_weather['obs_value'] = df_weather['obs_value'] / 10 

df_out = df_weather\
    .loc[df_weather.obs_type == 'TMAX']\
    .sort_values(by=['station', 'datetime'])\
    .reset_index(drop=True)\
    .copy()

return df_out 

load_weather(1863)`

MatPiq commented 3 years ago

You don't necessarily need to use the method-chaining approach if you were able to now make it work, but if you are curious and want to try doing everything in one chain I would recommend reading up on the method ".assign()" https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.assign.html. With this you can create new variables/columns within the chain.