kzavrazhnyi / pythonthefirst

2 stars 5 forks source link

Project energy savings #2

Open LaurensNT opened 2 years ago

LaurensNT commented 2 years ago

Did you find the solution to DataCamp project energy savings ? I have the same problem

alexkolo commented 2 years ago

Your problem is that for the first datatset, there are different ways of writing 'Ferrous Metals' and 'Non-ferrous Metals' and you haven't included all of them yet. Check out wastestats['waste_type'].unique() and watch out for upper and lower case.

kzavrazhnyi commented 2 years ago

Thank you!

wastestats_filter = wastestats[wastestats['waste_type'] .isin(['Ferrous metal', 'Non-ferrous metal', 'Ferrous Metal', 'Non-ferrous Metals', 'Ferrous Metals', 'Non-ferrous metals', 'Glass', 'Plastics', 'Plastic'])]

energy_saved_data['Ferrous metal'] = energy_saved_data['Ferrous Metal'] energy_saved_data['Ferrous Metals'] = energy_saved_data['Ferrous Metal'] energy_saved_data['Non-ferrous metal'] = energy_saved_data['Non-Ferrous Metal'] energy_saved_data['Non-ferrous metals'] = energy_saved_data['Non-Ferrous Metal']

LaurensNT commented 2 years ago

Thank you very much. I really appreciate. Do you have the full code of the project ? Because I also saw the differences in writings and I tried to adapt it, but I still had the problem

kzavrazhnyi commented 2 years ago

Updated https://github.com/kzavrazhnyi/pythonthefirst/blob/master/EnergySavings

LaurensNT commented 2 years ago

Thank you very much :) !

alexkolo commented 2 years ago

FYI, i'm doing it the filtering via the apply function

edf = pd.read_csv('datasets/energy_saved.csv')
seng_list = pd.to_numeric(edf.loc[3,:].str.split(' ',expand=True)[0],errors='coerce').tolist()
seng_of_mat = dict(zip(edf.loc[2,:].tolist(),seng_list)) # conversion factor from 1 recycled tonne to saved kWh

w01 = pd.read_csv('datasets/wastestats.csv')
def mat_of_waste01(columns):
    waste_type = columns[0]
    recycled_t = columns[2] # "total_waste_recycled_tonne"

    k = None 
    if waste_type in ['Plastics','Plastic']:
        k = 'Plastic'
    elif waste_type in ['Glass']:
        k = 'Glass'
    elif waste_type in ['Ferrous metal','Ferrous Metal','Ferrous Metals']:
        k = 'Ferrous Metal'
    elif waste_type in ['Non-ferrous metal','Non-Ferrous Metal','Non-ferrous Metals','Non-ferrous metals']:
        k = 'Non-Ferrous Metal'
    #endif

    if k is not None:
        out = [k,seng_of_mat[k] * recycled_t]
    elif waste_type in ['Total']:
        out = ['Annual Sum',0]
    else:
        out = ['other',0]
    #endif
    return pd.Series(out, index=['material','energy_saved'])
#enddef
w01_m = w01.apply(mat_of_waste01, axis=1, result_type='expand')
w01b = w01.join(w01_m)

#Saved energy for year 2003 to 2017
df_es = pd.DataFrame( w01b.groupby(["year"])["energy_saved"].sum() )
kzavrazhnyi commented 2 years ago

Congratulations!

LaurensNT commented 2 years ago

Thanks for the effort :)

LaurensNT commented 2 years ago

I am still not passing the project, I think my output is exactly the same as yours.

import pandas as pd df1 = pd.read_csv('datasets/wastestats.csv') df2 = pd.read_csv('datasets/2018_2019_waste.csv') df3 = pd.read_csv('datasets/energy_saved.csv')

four_materials = ['Ferrous metal', 'Ferrous Metal', 'Non-ferrous metal', 'Non-ferrous Metals', 'Non-ferrous metals', 'Ferrous Metals', 'Glass', 'Plastics', 'Plastic']

df1_four_materials = df1[df1["waste_type"].isin(four_materials)] df1_four_materials_year = df1_four_materials[df1_four_materials['year'] > 2014] df1_four_materials_year["waste_type"].replace({"Plastic": "Plastics", "Ferrous Metal": "Ferrous metal", "Ferrous Metals": "Ferrous metal", "Non-ferrous Metals": "Non-ferrous metal", "Non-ferrous metals": "Non-ferrous metal"}, inplace=True)

four_materials1 = ['Ferrous Metal', 'Glass', 'Plastics', 'Non-Ferrous Metal'] df2_1 = df2.rename(columns={'Waste Type':'waste_type', 'Year':'year', "Total Recycled ('000 tonnes)":"total_waste_recycled_tonne"}) df2_four_materials = df2_1[df2_1["waste_type"].isin(four_materials1)] df2_four_materials['total_waste_recycled_tonne'] = df2_1['total_waste_recycled_tonne'] * 1000 df2_four_materials["waste_type"].replace({"Ferrous Metal": "Ferrous metal", "Non-Ferrous Metal": "Non-ferrous metal"}, inplace=True)

df3.columns = df3.iloc[2] df3_correct = df3.iloc[3:4, 1:5] df3_correct.reset_index(drop=True, inplace=True) function = lambda x: x.str.replace(' Kwh', '') df3_correct = df3_correct.apply(function) df3_correct = df3_correct.rename(columns={'Plastic': "Plastics"}) df3_correct = df3_correct.rename(columns={'Ferrous Metal': "Ferrous metal"}) df3_correct = df3_correct.rename(columns={'Non-Ferrous Metal': "Non-ferrous metal"})

df3_correct['Plastics'] = df3_correct['Plastics'].astype('int') df3_correct['Glass'] = df3_correct['Glass'].astype('int') df3_correct['Ferrous metal'] = df3_correct['Ferrous metal'].astype('int') df3_correct['Non-ferrous metal'] = df3_correct['Non-ferrous metal'].astype('int')

df_t = df3_correct.T.reset_index() df_3_t = df_t.rename(columns={2: "waste_type", 0: "energy_saved"})

x = df1_four_materials_year.merge(df2_four_materials, how='outer')

y = x.merge(df_3_t, how='outer') z = y.sort_values('year').reset_index(drop=True) z.set_index('year', inplace=True) z['total_energy_saved'] = z.energy_saved * z.total_waste_recycled_tonne

annual_energy_savings = z.groupby(["year"])["total_energy_saved"].sum() annual_energy_savings = annual_energy_savings.to_frame() annual_energy_savings

LaurensNT commented 2 years ago

It finally worked. Never mind haha