matchakura / MY472assignment4

0 stars 0 forks source link

Comment on the final assignment by Tai #1

Open taiwoohoo opened 9 months ago

taiwoohoo commented 9 months ago

Content of the Report

The report effectively addresses the research question, offering some insights. However, certain aspects of the argument and graphical presentations could be improved for accuracy and clarity.

Firstly, in the analysis of age groups, the category "over 34" covers a disproportionately wide range. According to UK government data (2022), this group constitutes about 57% of the population. Thus, its representation in stop-and-search cases (21.7%) seems quite low compared to other age groups when considering equivalent age ranges. Expanding on this with additional data might reveal more nuanced insights.

Furthermore, Graph 5 could be clearer. The different measurements on the X-axis lead to potential misinterpretations. A grouped bar chart showing the distribution of outcomes across different ethnic groups could present the data more effectively.

Lastly, attention to detail is crucial for maintaining the report's authority. Repeated typos, such as "ethic group" instead of "ethnic group," should be corrected.

Code Evaluation

There are challenges in executing the code, particularly at the third chunk, which might be due to the instability of the API server. For better efficiency and reliability, consider storing the data in an SQL database or a consistently formatted CSV. file. This would mitigate potential issues in data retrieval and processing. There is a CSV. document in the file but only a few data are stored in it.

Additionally, Table 2 does not display in the HTML output due to incorrect Chunk Option Syntax. The "r" is missing in the chunk option {warning=FALSE, message=FALSE, echo=FALSE}, which should be corrected to {r warning=FALSE, message=FALSE, echo=FALSE} to ensure proper rendering of the table.

matchakura commented 9 months ago

Hi Tai,

Thank you so much for your comments! They are really helpful. I should be more careful about R setting next time.

The age groups are defined and categorized by the police data initially, so I did not change it. But you are right, it could be helpful to interpret the data by taking the census data of age into account. For this report I chose to focus on ethnicity only, because the sns data between different age group does not vary a lot as mentioned in report.

For the graph 5, the x-axis contains only one measurement, which is the time by month. The aim of this graph is to show the change of sns over the time and compare the sns among different region (so the data of five regions were plotted in one graph).
I did try bar-charts at the very beginning, but it could only compare the difference between each area within single time-point, which is less meaningless considering there are 36 months in total. Therefore, I changed to this line chart to show the trend over time.

For the code part, because the police data is a quite large dataset, I did not download it to local but created a dataframe. I just tested again and it took around 20 minutes for the chunk 3 to complete. Because the for loop needs to access a large number of APIs, it indeed takes a long time to run. This time to ensure the functionality, I used the for loop. I am still learning how to improve efficiency and sorry for the time it takes to run the complete rmd file. For the csv file, it is only a census data to help standardized the sns data for different region, as mentioned in the introduction and comments. Thus it only contains the census ethinicity data in UK.

Thanks again for your advice.

All the best, matchakura