Download the raw sequence files from the lab's github and ensure file integrity
Read through the original author's documentation and install the packages required for the pipeline they used.
Read additional documentation for each step of the pipeline so I better understand the purpose/what/why of each step (remedial genetics/bioinformatics?)
Set up a conceptual framework of what you want to achieve with any given data set. Create some documentation on what some of these steps might be
Obtain data set that you can work with (in my case from Dryad)
Check the data out, make sure it looks reliable and begin formatting a workflow that will work with this data set. Read about different approaches and try in order of ease/efficiency as long as they give you what you want in the end.
Get and look through my dataset (from labmate) to understand exactly what it is and what is in it. From there I can figure out what kind of questions I can ask.
Check the integrity of the data. Filter out low quality reads?
Figure out how to mine SNPs and if I need to do anything to the dataset before doing it.
Obtain and store my data. I am yet to receive my initial dataset and am not sure if I will get a hard copy or a download link. Also, I need to decide where to store this, including server (if my lab has any), hard drive, additional backup drives.
Make sure the data is ok to use, mainly readable and that it is good quality.
Learn the necessary commands to explore the dataset before further complicated analyses
What do you imagine are the first three steps in your class project? Feel free to ask clarifying question in class or using issues..