Miniessay Q - Githubissues

zxc0707 commented 9 months ago

Personal safety has always been a top concern for governments and people around the world. In open data Toronto, there are Neighborhood Crime Rates records information on neighbors and residents being assaulted by Assault, Auto Theft, Break and Enter, organized by community Robbery, etc.from 2014 to 2019 In this report, I will search, organize, plan and make a line chart to compare the number of Assaults in 4 selected regions in different years.

JerrZzzz commented 9 months ago

Consistency: I can see that the raw data you provided contain about 150 rows meaning there is 150 areas. Why do you want to only select only top five rows from the data set which will make the data much smaller and the analyze of the data might be a little biased?

JerrZzzz commented 9 months ago

every line: I think it is better to add some instructions to R chunk like message = F, echo = F so that we can get rid of those messages came from library().

zxc0707 commented 9 months ago

I can see that the raw data you provided contain about 150 rows meaning there is 150 areas. Why do you want to only select only top five rows from the data set which will make the data much smaller and the analyze of the data might be a little biased?

First of all, there is no loss or omission in the first 5 rows of data, which means that the accuracy and comprehensiveness of the data can be ensured when studying the problem. Secondly, more data means that there will be more text in the x-axis and y-axis when making graphs later, which will make the entire graph cluttered and affect subsequent analysis.

JerrZzzz commented 9 months ago

I can see that the raw data you provided contain about 150 rows meaning there is 150 areas. Why do you want to only select only top five rows from the data set which will make the data much smaller and the analyze of the data might be a little biased?

First of all, there is no loss or omission in the first 5 rows of data, which means that the accuracy and comprehensiveness of the data can be ensured when studying the problem. Secondly, more data means that there will be more text in the x-axis and y-axis when making graphs later, which will make the entire graph cluttered and affect subsequent analysis.

I think that having a graph which makes the year as the x-axis will be easier to visualize the change of criminal over the years within some area. Why do you want to make the x-axis the name of the area?

zxc0707 commented 9 months ago

I can see that the raw data you provided contain about 150 rows meaning there is 150 areas. Why do you want to only select only top five rows from the data set which will make the data much smaller and the analyze of the data might be a little biased?

First of all, there is no loss or omission in the first 5 rows of data, which means that the accuracy and comprehensiveness of the data can be ensured when studying the problem. Secondly, more data means that there will be more text in the x-axis and y-axis when making graphs later, which will make the entire graph cluttered and affect subsequent analysis.

I think that having a graph which makes the year as the x-axis will be easier to visualize the change of criminal over the years within some area. Why do you want to make the x-axis the name of the area?

Using the year as the x-axis can also handle the data well, but reflecting the fluctuations in the number of assaults on the y-axis is better for me to see the fluctuations in each year and use it to analyze the differences in the type of assertions from 2014 to 2019.

JerrZzzz commented 9 months ago

Functionality: is it when you try to use gather() function to create a new data set and try to create a new column the way you wanted? Do you want the data maybe putting the same name together rather than putting the same year together.

JerrZzzz commented 9 months ago

Complexity: Do you think that the part where you use a head function on a big data base is a bit complex? Maybe choosing only a few columns can be better?

JerrZzzz commented 9 months ago

Tests: I think that your code is pretty good. If we change any code using the data set name above, we can run the code without problem. Do you think so?

JerrZzzz commented 9 months ago

Naming: I can see that you use a name ccleaned_crimerate_data. I don't really know what it means. Can you explain it a bit?

JerrZzzz commented 9 months ago

Comments: You add an introduction and an analyze part for explaining the whole code content. Do you think it might be better to add some explanation to the gather() function part of the code.

JerrZzzz commented 9 months ago

Style: I think that it might be better to add a name to all the number you used to chose the area. you used the number 4. If anyone what to chose more rows they just have to replace the number 4 in the name part without changing the code. Do you think it is better?

JerrZzzz commented 9 months ago

Documentation: I can see that you are missing reference? can I ask for it?

JerrZzzz commented 9 months ago

Good things: I think that using gather() function to create that specific data set is a fantastic way to do it. I can learn a lot from it. Thank you!!

zxc0707 commented 9 months ago

Consistency: I can see that the raw data you provided contain about 150 rows meaning there is 150 areas. Why do you want to only select only top five rows from the data set which will make the data much smaller and the analyze of the data might be a little biased? First of all, there is no loss or omission in the first 5 rows of data, which means that the accuracy and comprehensiveness of the data can be ensured when studying the problem. Secondly, more data means that there will be more text in the x-axis and y-axis when making graphs later, which will make the entire graph cluttered and affect subsequent analysis.

zxc0707 commented 9 months ago

every line: I think it is better to add some instructions to R chunk like message = F, echo = F so that we can get rid of those messages came from library().

I think the problem you gave in this regard is very worthy of consideration and improvement. When I run R chunk like message = F, echo = F, I can avoid similar problems and make the whole Reports become disorganized.

zxc0707 commented 9 months ago

Functionality: is it when you try to use gather() function to create a new data set and try to create a new column the way you wanted? Do you want the data maybe putting the same name together rather than putting the same year together.

When I use the gather() code, I want to split the original column names in the csv file such as: 2014_assaults/2015_assaults... into: “year” and “assaults” This will be more conducive to my subsequent charting and analyze problems.

zxc0707 commented 9 months ago

Complexity: Do you think that the part where you use a head function on a big data base is a bit complex? Maybe choosing only a few columns can be better?

When I use the head() code to select and extract data, my first thought is that the database is too large, which means that it will affect the appearance and ability to extract information when drawing later. I think tail() is also a good method in this case.

zxc0707 commented 9 months ago

Tests: I think that your code is pretty good. If we change any code using the data set name above, we can run the code without problem. Do you think so?

I believe if I change any of the code using the data set name above, the code below will have issues. Because the database name is consistent no matter when processing the data or using the data for problem analysis and image creation. If any of these are changed, the entire code will fail at runtime because the given database (original database name) cannot be found.

zxc0707 commented 9 months ago

Naming: I can see that you use a name ccleaned_crimerate_data. I don't really know what it means. Can you explain it a bit?

The name of "ccleaned_crimerate_data" is just another name of a dataset, which is used to identify it to the other dataset. In other words, it doesn't have any particular meaning for all report :)

zxc0707 commented 9 months ago

Comments: You add an introduction and an analyze part for explaining the whole code content. Do you think it might be better to add some explanation to the gather() function part of the code.

I believe that if I add the function and usefulness of the gather() code to my analysis part, the entire report will look more convincing and rigorous. Because this plays a key role in my entire report.

zxc0707 commented 9 months ago

Style: I think that it might be better to add a name to all the number you used to chose the area. you used the number 4. If anyone what to chose more rows they just have to replace the number 4 in the name part without changing the code. Do you think it is better?

I think the method you provided regarding style will save me more time when creating code and make the whole report tidier.

zxc0707 commented 9 months ago

Documentation: I can see that you are missing reference? can I ask for it?

For references, I put the original database csv file into the data in my input folder. This is where my entire data comes from.

zxc0707 commented 9 months ago

Good things: I think that using gather() function to create that specific data set is a fantastic way to do it. I can learn a lot from it. Thank you!!

You're welcome, I think the gather() code can be a great help in separating the data column names, which will make your plot much easier to work with. Because some years and events are separated in my case, the x-axis and y-axis work is clarified by using gather().

zxc0707 / miniessay2

Miniessay Q #1