mylinhthibodeau / STAT545-HW-thibodeau-mylinh

0 stars 0 forks source link

hw07 ready for grading #7

Open mylinhthibodeau opened 6 years ago

mylinhthibodeau commented 6 years ago

Dear colleagues, dear STAT545/547 team,

I tried something new, and it didn't go down exactly as planned, but I still hope you will enjoy reading my homework :)

Homework 7 repository HERE

Don't hesitate to let me know if you have any questions !

Thank you for your time! Regards, My Linh Thibodeau

mylinhthibodeau commented 6 years ago

Dear @CassKon, dear @mlawre01,

Welcome to my repository ! Here are a few helpful points for your to know about my homework 7:

There is a very ugly "stitched pdf file" of the "summary Rmd file" (see README for explanation), but although it might be a bit challenging to follow along my homework but ...

I highly recommend you stick with the original and cleaner files (hyperlinks in message above) or that you download the original data to run my code, because the pdf does not make justice to the code at all !!

If you have any questions, please don't hesitate to let me know. Thank you so much for your time and I assure you that your feedback is greatly appreciated and I welcome all suggestions.

Warm regards,

My Linh

CassKon commented 6 years ago

Hi My Linh,

Cheers,

Cassandra

mlawre01 commented 6 years ago

Hey,

Overall, very impressive assignment; great job tackling and applying what you’ve learned to a new dataset! In general, it seems like you put a ton of work into your assignment and learned a lot which is great.

Some notes:

Updated comment (I was really tired when I wrote this):

Again, great job on homework assignment 7 & good luck with the rest of the course!

ksedivyhaley commented 6 years ago

Three or more scripts, an Rmd, and a Makefile: Yes Starts by downloading data, ends with Rmd: Yes The output of each is the modified input of the previous step: No (see comments) Includes some analysis and at least one figure: Yes Makefile includes all scripts and Rmd with correct dependencies: No (see comments) Makefile runs: No (see comments) Bonus: Non-gapminder dataset. Reflection: Yes

Comments:

Your mark will be distributed later. If you would like more feedback, please feel free to message me on slack.

ksedivyhaley commented 6 years ago

New info: apparently windows machines use del instead of rm, so that might be your problem with clean.

mylinhthibodeau commented 6 years ago

Dear @ksedivyhaley,

Thank you so much for your thorough review, I greatly appreciate it and I am grateful for the specific feedback you are providing because I really want to improve my ability to manage my research data ! I am trying to optimize my course learning in order to directly translate knowledge to immediate applications in my field. I am sorry about the lack of clarity in my homework, the Canadian Cancer Research Conference kept me quite busy until November 7/2017 and I understand it must have been a bit dry for you to go through my homework, sorry again for that.

If you wouldn't mind, I would love to have a few additional details on the points you raised, because those are actaully problems I struggle with and I don't know how to tackle them.

Comment: Step 4 (mut_sig_tables.R) uses output of step 2 (read_clean_genome_text_files.R ), while Step 6 (mut_sig_plot.R) uses output of Steps 2-4.

The breakdown of my plan was the following, and if you could point me the specific problematic piece, I would really appreciate it :)

Comment: mut_sig_tables.R does not use ALL_sig.tsv or aml_sig.tsv (instead uses ALL_mut and aml_mut) - looks like a typo.

I remember one of the step had a problem with ALL_mut.tsv because unlike the others, it only had one dataset, but I thought it ended up still working in the end.

Comment: mut_sig_plot.R lists 4 dependencies, but reads 10 files. Even if it runs fine from a clean slate, if one of the 6 unlisted files is somehow deleted, the pipeline will then crash when you try to run mut_sig_plot.R.

Comment: You write 16 separate tsv files in mut_sig_tables.R. Do you really need to access all of those? You don't have to save every intermediate step, but the significant ones.

Comment: Given that pdfs don't embed in the report well, why not save as png files?

Comment: Your clean statement works fine on my machine. Might be a directory issue.

Comment: Speaking of directory issues, your directories in read_clean_genome_text_files.R are much too specific – it only functions if STAT545-HW-thibodeau-mylinh is directly on the user's Desktop, not likely to be the case for people who clone your repo and possibly not even the case on your own machine in a year or two. Likewise with the cancer specific mutational signature files – which your Makefile should include rules to download, rather than assuming that the user already has them.

Comment: I'm confused by what read_clean_genome_text_files.R actually does in terms of cleaning – it looks like you just read the files in and re-save them. Are you basically just moving them to the STAT545 directory?

Well something I can promise you is that I will annotate my code better, which I think I succeed to do in homework 8. Looking back at how cryptic homework 7 is to read encouraged me to make a tutorial for homework 8.

Thank you again so much for your help,

Warm regards, My Linh Thibodeau