Open LaurensNT opened 2 years ago
Your problem is that for the first datatset, there are different ways of writing 'Ferrous Metals' and 'Non-ferrous Metals' and you haven't included all of them yet. Check out wastestats['waste_type'].unique()
and watch out for upper and lower case.
Thank you!
wastestats_filter = wastestats[wastestats['waste_type'] .isin(['Ferrous metal', 'Non-ferrous metal', 'Ferrous Metal', 'Non-ferrous Metals', 'Ferrous Metals', 'Non-ferrous metals', 'Glass', 'Plastics', 'Plastic'])]
energy_saved_data['Ferrous metal'] = energy_saved_data['Ferrous Metal'] energy_saved_data['Ferrous Metals'] = energy_saved_data['Ferrous Metal'] energy_saved_data['Non-ferrous metal'] = energy_saved_data['Non-Ferrous Metal'] energy_saved_data['Non-ferrous metals'] = energy_saved_data['Non-Ferrous Metal']
Thank you very much. I really appreciate. Do you have the full code of the project ? Because I also saw the differences in writings and I tried to adapt it, but I still had the problem
Thank you very much :) !
FYI, i'm doing it the filtering via the apply
function
edf = pd.read_csv('datasets/energy_saved.csv')
seng_list = pd.to_numeric(edf.loc[3,:].str.split(' ',expand=True)[0],errors='coerce').tolist()
seng_of_mat = dict(zip(edf.loc[2,:].tolist(),seng_list)) # conversion factor from 1 recycled tonne to saved kWh
w01 = pd.read_csv('datasets/wastestats.csv')
def mat_of_waste01(columns):
waste_type = columns[0]
recycled_t = columns[2] # "total_waste_recycled_tonne"
k = None
if waste_type in ['Plastics','Plastic']:
k = 'Plastic'
elif waste_type in ['Glass']:
k = 'Glass'
elif waste_type in ['Ferrous metal','Ferrous Metal','Ferrous Metals']:
k = 'Ferrous Metal'
elif waste_type in ['Non-ferrous metal','Non-Ferrous Metal','Non-ferrous Metals','Non-ferrous metals']:
k = 'Non-Ferrous Metal'
#endif
if k is not None:
out = [k,seng_of_mat[k] * recycled_t]
elif waste_type in ['Total']:
out = ['Annual Sum',0]
else:
out = ['other',0]
#endif
return pd.Series(out, index=['material','energy_saved'])
#enddef
w01_m = w01.apply(mat_of_waste01, axis=1, result_type='expand')
w01b = w01.join(w01_m)
#Saved energy for year 2003 to 2017
df_es = pd.DataFrame( w01b.groupby(["year"])["energy_saved"].sum() )
Thanks for the effort :)
I am still not passing the project, I think my output is exactly the same as yours.
import pandas as pd df1 = pd.read_csv('datasets/wastestats.csv') df2 = pd.read_csv('datasets/2018_2019_waste.csv') df3 = pd.read_csv('datasets/energy_saved.csv')
four_materials = ['Ferrous metal', 'Ferrous Metal', 'Non-ferrous metal', 'Non-ferrous Metals', 'Non-ferrous metals', 'Ferrous Metals', 'Glass', 'Plastics', 'Plastic']
df1_four_materials = df1[df1["waste_type"].isin(four_materials)] df1_four_materials_year = df1_four_materials[df1_four_materials['year'] > 2014] df1_four_materials_year["waste_type"].replace({"Plastic": "Plastics", "Ferrous Metal": "Ferrous metal", "Ferrous Metals": "Ferrous metal", "Non-ferrous Metals": "Non-ferrous metal", "Non-ferrous metals": "Non-ferrous metal"}, inplace=True)
four_materials1 = ['Ferrous Metal', 'Glass', 'Plastics', 'Non-Ferrous Metal'] df2_1 = df2.rename(columns={'Waste Type':'waste_type', 'Year':'year', "Total Recycled ('000 tonnes)":"total_waste_recycled_tonne"}) df2_four_materials = df2_1[df2_1["waste_type"].isin(four_materials1)] df2_four_materials['total_waste_recycled_tonne'] = df2_1['total_waste_recycled_tonne'] * 1000 df2_four_materials["waste_type"].replace({"Ferrous Metal": "Ferrous metal", "Non-Ferrous Metal": "Non-ferrous metal"}, inplace=True)
df3.columns = df3.iloc[2] df3_correct = df3.iloc[3:4, 1:5] df3_correct.reset_index(drop=True, inplace=True) function = lambda x: x.str.replace(' Kwh', '') df3_correct = df3_correct.apply(function) df3_correct = df3_correct.rename(columns={'Plastic': "Plastics"}) df3_correct = df3_correct.rename(columns={'Ferrous Metal': "Ferrous metal"}) df3_correct = df3_correct.rename(columns={'Non-Ferrous Metal': "Non-ferrous metal"})
df3_correct['Plastics'] = df3_correct['Plastics'].astype('int') df3_correct['Glass'] = df3_correct['Glass'].astype('int') df3_correct['Ferrous metal'] = df3_correct['Ferrous metal'].astype('int') df3_correct['Non-ferrous metal'] = df3_correct['Non-ferrous metal'].astype('int')
df_t = df3_correct.T.reset_index() df_3_t = df_t.rename(columns={2: "waste_type", 0: "energy_saved"})
x = df1_four_materials_year.merge(df2_four_materials, how='outer')
y = x.merge(df_3_t, how='outer') z = y.sort_values('year').reset_index(drop=True) z.set_index('year', inplace=True) z['total_energy_saved'] = z.energy_saved * z.total_waste_recycled_tonne
annual_energy_savings = z.groupby(["year"])["total_energy_saved"].sum() annual_energy_savings = annual_energy_savings.to_frame() annual_energy_savings
It finally worked. Never mind haha
Did you find the solution to DataCamp project energy savings ? I have the same problem