hackseq / hackseq_projects_2017

6 stars 1 forks source link

Project 5: Developing advanced R tutorials for genomic data analysis #5

Open abaghela opened 7 years ago

abaghela commented 7 years ago

Developing advanced R tutorials for genomic data analysis

If you use R for genomic data analysis, then this project might be for you. How much of your advanced R knowledge was earned through serendipitous discovery of random online tutorials and/or endless trial and error? If you're anything like me, the answer is frustratingly a lot. I would like to make it easier for the next generation of bioinformaticians to hit the ground running with R. Unless the package you want to use is on Bioconductor, it can be difficult to find more intermediate or advanced tutorials online, let alone examples focused on genomics. I propose that we pool our collective knowledge of R and build a set of tutorials covering advanced topics related to genomics. Personally, I would like to see more tutorials on how to use the powerful packages within the tidyverse (e.g., ggplot2, dplyr, tidyr and purrr) for genomic data analysis as well as examples showing how to leverage packages like knitr and rmarkdown for sharing reproducible code. I would like us to brainstorm additional ideas and reach a consensus on which tutorials we will develop over the course of three days. I envision that these tutorials could easily be combined to form the basis of hands-on workshops. All material will be openly developed and made freely available online under a CC-BY license. Given the project's goal, you should be moderately experienced with R. If you have more than 1-2 years of continuous experience with R in genomics, you probably have something to contribute to our effort. Other than knowledge of R, I think the second most important criteria is diversity: diverse expertise, diverse backgrounds, and diverse people. If you are excited by the idea and feel like you have something to contribute, I encourage you to apply!

Team Lead : Bruno Grande | bgrande@sfu.ca | @brunogrande | Grad Student | Simon Fraser University

LiNk-NY commented 7 years ago

Bioconductor already has a good number of workflows / tutorials. Please see: http://bioconductor.org/help/workflows/ Is this what you have in mind but using the tidyverse?

BrunoGrandePhD commented 7 years ago

I think the Bioconductor workflows are great! I've used them myself in the past. I think what's missing is, at least for some of them, the use of more modern R tools (e.g. the tidyverse packages) and a consistent goal and dataset such that the tutorials can be combined to form the basis for workshops. For context, I'm a Software Carpentry instructor, so I was imagining something along the lines of a SWC lesson (erring on the side of modularity).

knausb commented 7 years ago

There's an R genetics community outside of bioconductor as well. I'm an author of vcfR which may be relevant to #4 as well. It does include some tidy functionality. We've placed some documentation here. The NESCent people have some nice material as well including popgenInfo.

zachary-foster commented 7 years ago

I work with metabarcoding data and am the developer of taxa and metacoder. I mostly deal with taxonomic information related to high-throughput sequencing. Not strictly genomics.

Are you interested in incorporating taxonomic information anywhere in you workflows?

jakelever commented 6 years ago

Hey team lead, we've been gathering Github IDs for your team members. As you've likely been notified, we've created a project repo for you that you are now the admin of and have added the team members to this. We've received almost everyone's Github ID and will continue to add members as we got their Github IDs.

Project repo: https://github.com/hackseq/2017_project_5

Feel free to rename the repo as appropriate. Note that the repo currently has an MIT license. Amend this as required. It'd be a great idea to start a discussion on this repo with information to get your team members started (e.g. some small suggested reading, things to look up, etc). We will also be adding everyone to Slack and creating a specific channel for each project. This may be an easier way to communicate.

Thanks, Jake obo the Hackseq organising committee