The NYPD Arrests Dataset project proposal examines crime from 2006-2018 in NYC. The dataset is called the NYPD Arrests Data which is owned by NYC OpenData-it is updated every quarter by the NYPD so is a reliable data source with very frequent updates. It contains 18 columns with over 4 million records. The group hopes to find some correlation among economic prosperity, demographics, and crime over the time period since the dataset contains data during the financial crisis up until a year ago, over which they presume New York City has been increasing in prosperity (financial wellness). Overall, they want to analyze crime records and ascertain how the type of crime has changed over the past 13 years.
One thing I liked about the proposal is that there is a lot of data and strong features that provide a ton of room for exploring patterns and hypothesizing. I also like one part of the objective which is exploring the crime demographics to potentially find certain areas getting better or worse, or certain races doing better than others crime wise. Lastly, I like the reliability of the data source since it is professionally maintained by NYPD which adds to the accuracy of findings.
One area for improvement is the clarity of the objective itself. There seems to be a general objective to finding out how crime has changed overall during the time period, but then it also mentions the dataset contains data from 2008 (the financial crash) which can exemplify how an increase in prosperity has changed the type of crime.
Another area for improvement is to identify what features can be used for attacking the objective. Lastly, you should add something about what you are going to try to predict? Besides exploring the data, what specific questions can you explore using an input space and corresponding output?
The NYPD Arrests Dataset project proposal examines crime from 2006-2018 in NYC. The dataset is called the NYPD Arrests Data which is owned by NYC OpenData-it is updated every quarter by the NYPD so is a reliable data source with very frequent updates. It contains 18 columns with over 4 million records. The group hopes to find some correlation among economic prosperity, demographics, and crime over the time period since the dataset contains data during the financial crisis up until a year ago, over which they presume New York City has been increasing in prosperity (financial wellness). Overall, they want to analyze crime records and ascertain how the type of crime has changed over the past 13 years.
One thing I liked about the proposal is that there is a lot of data and strong features that provide a ton of room for exploring patterns and hypothesizing. I also like one part of the objective which is exploring the crime demographics to potentially find certain areas getting better or worse, or certain races doing better than others crime wise. Lastly, I like the reliability of the data source since it is professionally maintained by NYPD which adds to the accuracy of findings.
One area for improvement is the clarity of the objective itself. There seems to be a general objective to finding out how crime has changed overall during the time period, but then it also mentions the dataset contains data from 2008 (the financial crash) which can exemplify how an increase in prosperity has changed the type of crime. Another area for improvement is to identify what features can be used for attacking the objective. Lastly, you should add something about what you are going to try to predict? Besides exploring the data, what specific questions can you explore using an input space and corresponding output?