Open mylinhthibodeau opened 6 years ago
Dear @CassKon, dear @mlawre01,
Welcome to my repository ! Here are a few helpful points for your to know about my homework 7:
There is a very ugly "stitched pdf file" of the "summary Rmd file" (see README for explanation), but although it might be a bit challenging to follow along my homework but ...
If you have any questions, please don't hesitate to let me know. Thank you so much for your time and I assure you that your feedback is greatly appreciated and I welcome all suggestions.
Warm regards,
My Linh
Hi My Linh,
Great job on the assignment - wow! I think you could have used a few more figures though … haha Just kidding you definitely met that quota!
For the summary stats where you talk about using a loop, I ran into the same problem. My solution was to write a function and then use ddply to execute the function on parts of the data frame. I’m sure that there are other options out there though!
I found the rmd hard to follow as it didn’t have a lot of the actual coding in it and it was a lot of back and forth between links (downloading the code did help here!) . Really great comments throughout the file though!
Really great extension of the homework exercise to another dataset and sounds like you will be able to use some of this code again which is awesome!
I am not sure that I have any idea to contribute towards your data cleanup issue :(
I enjoyed how detailed the introduction and reflection were for your assignment. It helped give context to your assignment
Cheers,
Cassandra
Hey,
Overall, very impressive assignment; great job tackling and applying what you’ve learned to a new dataset! In general, it seems like you put a ton of work into your assignment and learned a lot which is great.
Some notes:
Updated comment (I was really tired when I wrote this):
Again, great job on homework assignment 7 & good luck with the rest of the course!
Three or more scripts, an Rmd, and a Makefile: Yes Starts by downloading data, ends with Rmd: Yes The output of each is the modified input of the previous step: No (see comments) Includes some analysis and at least one figure: Yes Makefile includes all scripts and Rmd with correct dependencies: No (see comments) Makefile runs: No (see comments) Bonus: Non-gapminder dataset. Reflection: Yes
Comments:
Your mark will be distributed later. If you would like more feedback, please feel free to message me on slack.
New info: apparently windows machines use del
instead of rm
, so that might be your problem with clean.
Dear @ksedivyhaley,
Thank you so much for your thorough review, I greatly appreciate it and I am grateful for the specific feedback you are providing because I really want to improve my ability to manage my research data ! I am trying to optimize my course learning in order to directly translate knowledge to immediate applications in my field. I am sorry about the lack of clarity in my homework, the Canadian Cancer Research Conference kept me quite busy until November 7/2017 and I understand it must have been a bit dry for you to go through my homework, sorry again for that.
If you wouldn't mind, I would love to have a few additional details on the points you raised, because those are actaully problems I struggle with and I don't know how to tackle them.
Comment: Step 4 (mut_sig_tables.R) uses output of step 2 (read_clean_genome_text_files.R ), while Step 6 (mut_sig_plot.R) uses output of Steps 2-4.
The breakdown of my plan was the following, and if you could point me the specific problematic piece, I would really appreciate it :)
Comment: mut_sig_tables.R does not use ALL_sig.tsv or aml_sig.tsv (instead uses ALL_mut and aml_mut) - looks like a typo.
ALL
refers to Acute Lymphoid Leukemia (ALL) while all
refers to "all of the elements" and aml
refers to acute myeloid leukemia. Therefore, all (in the sense of the totality) of the files are used by mut_sig_tables.R
I remember one of the step had a problem with ALL_mut.tsv because unlike the others, it only had one dataset, but I thought it ended up still working in the end.
Comment: mut_sig_plot.R lists 4 dependencies, but reads 10 files. Even if it runs fine from a clean slate, if one of the 6 unlisted files is somehow deleted, the pipeline will then crash when you try to run mut_sig_plot.R.
Comment: You write 16 separate tsv files in mut_sig_tables.R. Do you really need to access all of those? You don't have to save every intermediate step, but the significant ones.
Comment: Given that pdfs don't embed in the report well, why not save as png files?
Comment: Your clean statement works fine on my machine. Might be a directory issue.
Comment: Speaking of directory issues, your directories in read_clean_genome_text_files.R are much too specific – it only functions if STAT545-HW-thibodeau-mylinh is directly on the user's Desktop, not likely to be the case for people who clone your repo and possibly not even the case on your own machine in a year or two. Likewise with the cancer specific mutational signature files – which your Makefile should include rules to download, rather than assuming that the user already has them.
Comment: I'm confused by what read_clean_genome_text_files.R actually does in terms of cleaning – it looks like you just read the files in and re-save them. Are you basically just moving them to the STAT545 directory?
Well something I can promise you is that I will annotate my code better, which I think I succeed to do in homework 8. Looking back at how cryptic homework 7 is to read encouraged me to make a tutorial for homework 8.
Thank you again so much for your help,
Warm regards, My Linh Thibodeau
Dear colleagues, dear STAT545/547 team,
I tried something new, and it didn't go down exactly as planned, but I still hope you will enjoy reading my homework :)
Homework 7 repository HERE
Don't hesitate to let me know if you have any questions !
Thank you for your time! Regards, My Linh Thibodeau