zx26867 / Zhihan_Xian_MADA_project

https://zx26867.github.io/ZhihanXian-MADA-portfolio/aboutme.html
0 stars 2 forks source link

data not quite suitable #1

Open andreashandel opened 2 years ago

andreashandel commented 2 years ago

Unfortunately the dataset you propose to analyze is not quite suitable for the class project. It comes from the UCI Machine Learning Repository and is already fully cleaned, it's not "real world" data anymore. For this class project, the data needs to require cleaning/wrangling. Please change the dataset. If you want to stick with the wine idea, see if you can find some other "real world" datasets on that topic. You can certainly use the UCI dataset in addition, just not as the sole data source. Of course, you are also still allowed to switch the topic. It just needs to be "real world" (messy) data, either one or multiple datasets.

zx26867 commented 2 years ago

Hi Dr. Handel,

I do not have my own research data suitable for this project so I have to look up online. Here I found another one, which is about jobs and salaries. According to the source, it is a messy and real world dataset. Here is the link to the dataset source website and the link to the dataset. I think maybe I can use it to predict a person's salary given inputs of industry, job title, gender, race, diploma, state, year of experience, etc, through machine learning approach or something like that. Do you think it is a good one? If so, I will go ahead and update my project webpage.

andreashandel commented 2 years ago

That dataset seems suitable.