hw04 ready for grading - Githubissues

mylinhthibodeau commented 7 years ago

Dear @abishekarun and @Tangjiahui26,

I unfortunately didn't realize that we only needed to pick two activities in total, and that's why my original Homework 4 (now entitled "long-version-stat545-hw04-thibodeau-mylinh") file is NOT THE ONE TO REVIEW.

For the MARKING/PEER REVIEW FILES, you can limit yourself to the README file HERE and these two cheatsheets:

General data reshaping and relationship to aggregation - Activity#5 (data manipulation sample): cheatsheet md file HERE comparing tidyr, reshape and R base ways of manipulating data
Join, merge, look up - Activity#2 (create your own cheatsheet on join functions): minimalist dplyr joining function cheatsheet md file HERE - A table is worth a thousand words

The general homework 4 repository is here

Thank you for your time, and if you have any questions, please don't hesitate to let me know, Warm regards My Linh

Tangjiahui26 commented 7 years ago

Peer Review:

Hi, My Linh! As you explained, I looked mainly at activity5 in the data reshaping part and activity2 in the join part. In fact, I also looked at your long version homework roughly. I think you did a really well job to explore your own dataset and it is cool to take advantage of what you have learned in class to solve your research problems.

As for activeity5, you matched different tasks with tidyr/dplyr function reshape2 function and base R operations separately. You used your own dataset, and you also called read.table() to open the local files, which is very useful.
Indeed, as you've concluded, I think most of the time, tidyr is good for data preprocessing, such as data cleaning and sorting. It can meet most of the tasks in your classification. Reshape2 is mainly for long data_frame and wide data_frame operations, including melt and *cast functions. Both tidyr and reshape2 can make it easier for us to use ggplot2 to plot.
Created your own cheatsheet on join functions in activity2, and used different kind of join function including mutating_join and filtering_join.
Each step has specific explanations and annotatins. The only suggestion I would give is that you can make it better by doing more proofreading, because some parts were not displayed well in your .md file.

Overall, it looks like you have put significant time and effort into this assignment. Great work!

Regards, Jiahui Tang

mylinhthibodeau commented 7 years ago

Dear @Tangjiahui26,

Thank you so much for your feedback, I greatly appreciate it !

I found myself having trouble with the RMarkdown formatting: I type 2 empty spaces after the end of a line to make sure that RMarkdown will understand that it needs to skip a line. When I used knitr on my personal computer, the formatting was perfect, as you can see HERE, but when I pushed the file to github, somehow, the spaces were lost, the titles scrammed together and the formatting messed up!

Never mind I found it!! I had to change "options(knitr.table.format = "html")" to this below in order for the RMarkdown formating to function properly !!

options(knitr.table.format = "markdown")

Thank you for your time, Warm regards My linh

abishekarun commented 7 years ago

Peer Review:

Hi, @mylinhthibodeau ! You did an excellent homework and went above and beyond the requirements of the assignment. In fact, you only need to pick one of the data reshaping prompt and a join prompt, but you have gone ahead by doing all the activities/prompts, which is worthy of praise. It was also very apt to see you use datasets pertaining to your research work and this motivates people like me.

Data manipulations in R

It was great to see that you had performed each of the task using functions from tidyr,base R and reshape packages.
Almost all the tasks were thoroughly explored using appropriate functions and this cheat-sheet is near perfect.
It was also nice to see that you had provided learning resources and references.

Data joining

All the join functions were explored properly with the two data-sets that you had chose.
One small doubt is that I guess anti_join function output is not rendered since it is not showing anything as output. I think if possible you can look into it.

I also checked your long version and it shows the kind of effort that you had put into this assignment. You had explored lot more than that was asked.

Some suggestions

One possible suggestion is to look into suppressWarnings(suppressMessages()) for removing the warning messages. It just makes the markdown file more clean.
The link to cheatsheet for activity 5 of reshape and activity 2 of joining in your long version doesnt work. I think the files might have been moved and you might wanna change the underlying links.
One suggestion would be to proofread and possibly play around with kablestyling option and pandoc options for formatting the tables.
One small thing that can be improved in this excellent work is formatting in the markdown file as it is inappropriate in some places.Otherwise it was great work.

It was great to see that you had put the links for the files in the issue as it makes easy to access the file. The fact that you had mentioned the struggles that you had experienced and the solutions that you followed is very helpful and useful for other students.It was also nice to see you mention the new things that you learned through this assignment in a clear and elaborate manner. I hope to follow this for upcoming assignments.Thanks for that.

Overall, I think your homework was really well done and hope you can keep it up!

Regards, Arun Rajendran

pgonzaleze commented 7 years ago

Hi @mylinhthibodeau here are some comments about your homework General data reshaping and relationship to aggregation: Yes Join, merge, look up: Yes Progress report: Yes Extras (optional; merge/match): Yes

Part 1

I appreciate your explanation and brief introduction,
I like your approach to tackle the tasks,
Working with different data base is challenging, but it is clear that you know perfectly your data, so you can play with it easier,
As you mentioned you spent a lot of time to solve specific parts of the task, but I have to tell you that once you get it, you get it! Troubleshooting is an investment but the results are amazingly satisfying,

Part 2

Same comments, perfect explanation of what you are planning to do or expect with each function,
I appreciate your explanations when encountered an error, in your case you found that the function was complaining because of the duplicity of gene names, then you filtered and solve the problem.
Simple things like the format of your document could be a pain, I would recommend you maybe to check which one is more common in your area and use that,
To round off your perfect work I would add the name of the variable/cancer_type (in this case) that you are mentioning, for example: Observation: We can see which cancer type (variable) has the highest or lowest mean.cancer gene expression for individual genes. So I assume that is the cancer type A2M.
I think that the most valuable part of your work is that you are using data that you use, the coding could be better because there are many ways to do the same but I think that your work is flawless!

Your marks will be distributed later,

Regards,

Pedro G

mylinhthibodeau / STAT545-HW-thibodeau-mylinh

hw04 ready for grading #4

Never mind I found it!! I had to change "options(knitr.table.format = "html")" to this below in order for the RMarkdown formating to function properly !!

Part 1

Part 2