namkyodai / 2022-UrbanComputation-SUTD

Notes, reading materials, data and codes for Spring Semester course named "Urban Data & Methods II: Computational Urban Analysis" at SUTD Singapore.
MIT License
4 stars 7 forks source link

Assignment week 2 #2

Open namkyodai opened 2 years ago

namkyodai commented 2 years ago

Copy from Xiaoyan,

I wonder what is the deliverable that you want us to hand in? Also, do you want us each to hand in a copy of work, or hand in one copy as a group?

namkyodai commented 2 years ago

You dont need to hand in the assignment since the program and works are in each of machine of yours.

The importance is to discuss with the group members and see how other members think and contribute for the overall understanding of everyone and help each other.

then, the group is required to make a 10 minutes presentation of the works in the following class.

Nam

ginger7117 commented 2 years ago

Thank you for the reply Dr. Nam!

Gelateo commented 2 years ago

I want to create a table that displays the different average values based on flat type for each town

I collapsed repetitive categories into unique values

SELECT COUNT(DISTINCT town) AS unique_town from merged_file_csv mfc

Find average resale price for each town

SELECT town, AVG(resale_price) AS avg_resale_price FROM merged_file_csv mfc GROUP BY town ORDER BY 2 desc

create new column showing average resale price of different flat types for each town

SELECT town , flat_type , AVG(resale_price) AS avg_resale_price FROM merged_file_csv mfc GROUP BY town , flat_type

convert flat types into columns with corresponding average resale prices by town

select 1 ROOM, 2 ROOM , 3 ROOM , 4 ROOM , 5 ROOM , EXECUTIVE, MULTI-GENERATION from ( select value, flat_type from merged_file_csv mfc ) d pivot ( max(value) for columnname in (1 ROOM, 2 ROOM , 3 ROOM , 4 ROOM , 5 ROOM , EXECUTIVE, MULTI-GENERATION) ) piv;

namkyodai commented 2 years ago

@Gelateo

I have uploaded the code for tomorrow class. Go go 2022-UrbanComputation-SUTD\notes\sg-houseprice, you will see a R file named sg-house-price-prediction.R

Try to use the reshape library of R, this will for sure solve the problem. For SQL pivoting, i will look for a solution

Gelateo commented 2 years ago

Ok thanks! :-)

ginger7117 commented 2 years ago

Hi Dr. Nam,

I am wondering if we will have the same group members through out the course. Thank you for your reply.

Xiaoyan

namkyodai commented 2 years ago

I want to create a table that displays the different average values based on flat type for each town

I collapsed repetitive categories into unique values

SELECT COUNT(DISTINCT town) AS unique_town from merged_file_csv mfc

Find average resale price for each town

SELECT town, AVG(resale_price) AS avg_resale_price FROM merged_file_csv mfc GROUP BY town ORDER BY 2 desc

create new column showing average resale price of different flat types for each town

SELECT town , flat_type , AVG(resale_price) AS avg_resale_price FROM merged_file_csv mfc GROUP BY town , flat_type

convert flat types into columns with corresponding average resale prices by town

select 1 ROOM, 2 ROOM , 3 ROOM , 4 ROOM , 5 ROOM , EXECUTIVE, MULTI-GENERATION from ( select value, flat_type from merged_file_csv mfc ) d pivot ( max(value) for columnname in (1 ROOM, 2 ROOM , 3 ROOM , 4 ROOM , 5 ROOM , EXECUTIVE, MULTI-GENERATION) ) piv;

@Gelateo Here goes the answer after you use this SQL

SELECT town , flat_type , AVG(resale_price) AS avg_resale_price FROM merged_file_csv GROUP BY town , flat_type;

it will creates a table of summary. However, this table is saved in RAM and not consider as a table. So if you want to pivot it, that table needs to be saved first, and u have 2 options, save it as a table or save it as a view

CREATE VIEW TEST as SELECT town , flat_type , AVG(resale_price) AS avg_resale_price FROM merged_file_csv GROUP BY town , flat_type;

then you use the following SQL to generate the pivot table

select town, avg(case when (flat_type='1 ROOM') then avg_resale_price else NULL end) as "1 ROOM", avg(case when (flat_type='2 ROOM') then avg_resale_price else NULL end) as "2 ROOM", avg(case when (flat_type='3 ROOM') then avg_resale_price else NULL end) as "3 ROOM", avg(case when (flat_type='4 ROOM') then avg_resale_price else NULL end) as "4 ROOM", avg(case when (flat_type='5 ROOM') then avg_resale_price else NULL end) as "5 ROOM", avg(case when (flat_type='EXECUTIVE') then avg_resale_price else NULL end) as "EXECUTIVE", avg(case when (flat_type='MULTI-GENERATION') then avg_resale_price else NULL end) as "MULTI-GENERATION", avg(case when (flat_type='MULTI GENERATION') then avg_resale_price else NULL end) as "MULTI GENERATION" from TEST group by town order by town;

Note that the value MULTI GENERATION and MULTI-GENERATION shall be the same, but in this case it is not, so you might consider to use REPLACE syntax to do this first.

Hope it works from your end.

namkyodai commented 2 years ago

Hi Dr. Nam,

I am wondering if we will have the same group members through out the course. Thank you for your reply.

Xiaoyan

@ginger7117 --> likely YES, but it is also optional as if you wish to do it by yourself, it is also be OK. The main reasons for grouping is that members can help each other.

namkyodai commented 2 years ago

@Gelateo for rounding in postgreSQL, try the following syntax

round(x::numeric,y) where x is the value, and y is the number after comma, e.g. round(3.145::numeric,2) will return 3.14


select town, round(avg(case when (flat_type='1 ROOM') then avg_resale_price else NULL end):: numeric ,2) as "1 ROOM", round(avg(case when (flat_type='2 ROOM') then avg_resale_price else NULL end):: numeric ,2) as "2 ROOM", round(avg(case when (flat_type='3 ROOM') then avg_resale_price else NULL end):: numeric ,2) as "3 ROOM", round(avg(case when (flat_type='4 ROOM') then avg_resale_price else NULL end):: numeric ,2) as "4 ROOM", round(avg(case when (flat_type='5 ROOM') then avg_resale_price else NULL end):: numeric ,2) as "5 ROOM", round(avg(case when (flat_type='EXECUTIVE') then avg_resale_price else NULL end):: numeric ,2) as "EXECUTIVE", round(avg(case when (flat_type='MULTI-GENERATION') then avg_resale_price else NULL end):: numeric ,2) as "MULTI-GENERATION", round(avg(case when (flat_type='MULTI GENERATION') then avg_resale_price else NULL end):: numeric ,2) as "MULTI GENERATION" from test group by town order by town;