Milestone 2 Feedback - Githubissues

asfarlathif commented 3 years ago

Hi Guanyu, Great work on milestone 2.

Good job on repo organization - putting each milestone files in their own subfolder; Detailed README as well!
You have not mentioned which dataset you are using at the beginning of your analysis
It would make sense to adjust the x-axis of the minimum flow histogram (or log transform it) as it is being skewed to the left.
How does the 1.2.2 plotting help answer your question?
I'd actually say your data by itself is untidy! Because for a given station ID and a given year there are two rows for each extreme_type (max and min) - this violates the first rule of tidy data.
Although I see that you have demonstrated how to transform the data between two type in your next exercise - for which you will get the marks
2.3, first data you have created is the actual tidy format for this data - removing station_id is not a good choice as it serves as a mark for each observation even though you wont be using it in any analysis.
Adding comments to your code explaining what each line does is also highly encouraged!
Well done overall. All the best for your next milestone! Please feel free to reach out if you have any further questions. The grades will be posted later this week in Canvas.

Guanyu0001 commented 3 years ago

Hi, Asfarlathif. Thank you for your reply. However, I may want my assignment to be regraded:

You have not mentioned which dataset you are using at the beginning of your analysis The instruction only mentioned "Begin by loading your data and the tidyverse package below" and the following code did include the data: library(datateachr) # <- might contain the data you picked! There is not a clear requirement about mentioning the data as milestone 2 follows milestone 1.
It would make sense to adjust the x-axis of the minimum flow histogram (or log transform it) as it is being skewed to the left. I have mentioned that "But I don’t want to repeat the task." I have done the transformation before. And this assignment encourages us to finish the tasks as much as we can.
How does the 1.2.2 plotting help answer your question? I have mentioned that: "The histogram is useful to visualize the distribution of extreme flow, which helps to answer this research question." Please note that a histogram per se is a way to visualize the distribution as in Wikipedia: "A histogram is an approximate representation of the distribution of numerical data. " And Q1 is: What is the distribution of the extreme flow?
I'd actually say your data by itself is untidy! Because for a given station ID and a given year there are two rows for each extreme_type (max and min) - this violates the first rule of tidy data. You may want to closely check the data again. There are two variables called month and day. It is very clear that each row is a unique observation for a given station on different days in the same year.
Although I see that you have demonstrated how to transform the data between two type in your next exercise - for which you will get the marks It is related to the question before.
2.3, first data you have created is the actual tidy format for this data - removing station_id is not a good choice as it serves as a mark for each observation even though you wont be using it in any analysis. The instruction asked that "Try to choose a version of your data that you think will be appropriate to answer these 2 questions in milestone 3. Use between 4 and 8 functions that we’ve covered so far". First, removing ID doesn't affect how I answer these 2 questions in milestone 3. Second, I have to use 4 and 8 functions.
Adding comments to your code explaining what each line does is also highly encouraged! I agreed with that. But I give comments for each code chunk in the main body for better reading. This is a problem of taste.

I am looking forward to your reply. Hope you can show the details about the deduction.

asfarlathif commented 3 years ago

Hi @Guanyu0001

Here is the breakdown of your marks for this milestone:

1.1 - 2.25/2.5 1.2 - 9.75/10 1.3 - 2.5/2.5 2.1 - 0/2.5 2.2 - 4/5 2.3 - 5/5 tidy submission - 2,5/2.5

There is not a clear requirement about mentioning the data as milestone 2 follows milestone 1. - When you are writing a data analysis report, it makes logical sense to start with an introduction what we can expect in the report and what are the data and information you will be using throughout. In that way, it is a good practice to start your analysis with an intro to your data even though there is no specific instruction for it. (imagine if a person is only reading this report and if you directly go to the research question, it throws them off on what those variables in the questions mean) - hope this clarifies your question
Sorry for the confusion- I was actually unclear on how Q1 and Q3 are different. in both, you are trying to find the distribution of the flow and use histograms to visualize them - you didn't lose marks for not answering the question but for not trying to explore it from a different angle or changing the question.
You are right regarding tidying the data, it depends on the question you are asking -- if you consider Month and Day as well in your analysis then the data is tidy. I apologize for overlooking it. My statement on the data being untidy would only apply if there were no date or month in the tibble. I will adjust the marks there (2.5/2.5)
for 2.2, although you demonstrated tidy and untidy data transformation - there was no written explanation on why you would consider the transformed data untidy and then tidy again. Thus a -1 deduction there.
2.3, Agree with you that it won't affect downstream analysis. But I prefer to keep metadata like that in the data just in case for later purposes (like annotating plots). It's not a big deal here and there was no deduction in that section too.
Regarding commenting codes, it is always a good idea to add comments on what each code line is doing especially when having multiple pipings - this would also have helped you in 2.2 untidy explanations.

Hope this clarifies your question. Here is the revised marks:

1.1 - 2.25/2.5 1.2 - 9.75/10 1.3 - 2.5/2.5 2.1 - 2.5/2.5 2.2 - 4/5 2.3 - 5/5 tidy submission - 2,5/2.5

Total now - 28.5

Guanyu0001 commented 3 years ago

Hi @asfarlathif, Thank you for your instant reply.

for 2.2, although you demonstrated tidy and untidy data transformation - there was no written explanation on why you would consider the transformed data untidy and then tidy again. Thus a -1 deduction there.

The instructions only mentioned that: "Be sure to explain your reasoning for this task. Show us the “before” and “after”." I probably didn't write too much but I did explain the reason and use code to show the “before” and “after”: "Now our data are tidy, then untidy them by storing the same variable in multiple columns, i.e., in wide format, according to extreme_type." #Here is the rationale. And I commend the before and after in code: head(flow_sample) #before head(flow_sample_untidy) #after Finally, I put all the related datasets together: head(flow_sample) #original head(flow_sample_untidy) # untidy head(flow_sample_tidy) # tidy

If the point is to explain why you would consider the transformed data untidy and then tidy again. This should be mentioned in the instructions.

Hope you can kindly consider what I have done. Anyways, thank you for your time and consideration.

asfarlathif commented 2 years ago

Hi @Guanyu0001,

Yeah, I noticed the line you have mentioned. However, as we discussed the tidiness of your data can be quite tricky to decide as it depends on the variables you take into account and the questions you try to answer. In that sense, I was looking for a bit more explanation on why you'd consider the transformations you did to be untidy data and why it won't work in your case as it is. Hope this clarifies your question. Please let me know if you have any more clarifications. Happy to discuss!

stat545ubc-2021 / Guanyu_Chen

Milestone 2 Feedback #2