Picking a dataset - Githubissues

hackseq / 2017_project_5

Developing advanced R tutorials for genomic data analysis

https://hackseq.github.io/2017_project_5/

MIT License

1 stars 2 forks source link

Picking a dataset #3

Open BrunoGrandePhD opened 7 years ago

BrunoGrandePhD commented 7 years ago

I envisioned that we would pick a genomic dataset (or a set of related datasets) that we can all use so that the tutorials are consistent with one another. It then makes it easier to string the tutorials together as part of a longer workshop.

Who has ideas of good datasets that can be analyzed in different ways for each topic we end up selecting?

privefl commented 7 years ago

I think 1000 Genomes is super standard.

zhenyisong commented 7 years ago

It depends on our story-line. We analyze for what?

BrunoGrandePhD commented 7 years ago

@privefl: I agree that 1000G is pretty standard. Another option is the Genome in a Bottle, but that's a single sample and doesn't allow any cohort studies.

@zhenyisong: I think it would be too hard to maintain a perfectly uniform storyline across all tutorials. I think the best we can do is pick a uniform dataset (e.g. 1000G data) and analyze it in different ways to cover the various topics we want to develop tutorials for.

privefl commented 7 years ago

What I just discovered: http://googlegenomics.readthedocs.io/en/latest/

privefl commented 7 years ago

1000 Cannabis Genomes Project ahah