sr320 / course-fish546-2018

7 stars 2 forks source link

First steps in your project #11

Closed sr320 closed 6 years ago

sr320 commented 6 years ago

What do you imagine are the first three steps in your class project? Feel free to ask clarifying question in class or using issues..

yaaminiv commented 6 years ago

In my project, I want to identify differentially methylated loci and regions between Eastern oyster samples exposed to different pCO2 conditions. First steps involve aligning bisulfite sequencing data to the Eastern oyster genome, then identifying DMLs and DMRs.

  1. Get a functional account on Mox
  2. Run bismark on Mox: This will align my samples to the Eastern oyster genome.
  3. Test methylKit parameters using subset data while full samples are running on Mox: I previously ran bismark on a subset of my data on a different machine. While my full samples are running on Mox, I can use the subset data to test different parameters when identifying DMLs and DMRs. I can use this test to inform the settings used for my full samples on Mox.
kcribari commented 6 years ago

With a MiSeq run that is already accessible I must:

  1. Run quality control
  2. Run a pair end function that will take 6 file pairs and combine the reads from different directions
  3. De-multiplex
  4. De-replicate
  5. Cluster based on sample similarities

My main goal right now is to get to the point of distinguishing each individual sample and identifying the number of individual sequences within each sample. This would be presented in a (very long) table.

wsano16 commented 6 years ago
  1. I must go through a spreadsheet of elephant scat samples and identify samples for my analysis. I will be looking for samples from repeatedly sampled areas that have been genotyped as the same individual. I will also draw data from individuals sampled at the same time as my target samples.
  2. After sequencing these samples on a MiSeq, I will process my samples with a DADA2 workflow that I write up in Jupyter.
  3. I will visualize weighted and unweighted UniFrac via Emperor PCoA plots.
melodysyue commented 6 years ago

For my metabarcoding project, I will

  1. make sure the order of forward reads and reverse reads are the same;
  2. merge forward and reverse reads;
  3. filter by alignment score;
  4. demultiplex;
  5. dereplicate;
  6. de-noising: remove chimera, tag-jumps, bad PCR products etc.
  7. supervised taxonomy classification with reference database;
  8. generate summary file recording information such as the number of sequence reads after each step of filtering.
laurahspencer commented 6 years ago

Big picture:

  1. Understand the QuantSeq data - format, how it differs from RNASeq, how to to QA/QC on it
  2. Develop & test pipeline for aligning QuantSeq data to Oly transcriptome. I have both gonad and larval QuantSeq data - which tissue type was the transcriptome developed from? I know that Katherine's assembling one from gonad tissue, and I can possibly do one myself .
  3. Annotate genes that align with transcriptome .
  4. Analyze for differential gene expression
    Optional: assemble transcriptome myself, but since Katherine is working on it using tissue from my Oly gonad samples, should I bother?

I have not identified appropriate programs for the above, just getting started thinking about these steps. Katherine pointed me to some resources from the QuantSeq people, so I'll start there, but please let me know if you have any advice.

zscooper commented 6 years ago

I'm well into my project currently with 16S amplicon data. The steps so far went like this:

  1. Create a local directory for data
  2. Download data via ftp
  3. Run FastQC to check sequencing quality
  4. Use mothur for read joining, quality filtering (homopolymer removal, read length checks, dereplication, chimera removal), database alignment
  5. Use VSEARCH for clustering
  6. Assign taxonomy using the SILVA database.

Currently, I'm using the R package phyloseq for statistical analyses, including alpha-diversity calculations and ordination, and for producing publication quality graphics that allow me to make meaningful extrapolations from my data.

calderatta commented 6 years ago

For my exon capture project the end goal is to get as close as possible to a phylogeny as possible. My source data are in fastq format demultiplex. Here are the first steps for this project (based on methods from Kuang et al. 2018):

  1. Trip adapter and index sequences using Cutadapt (Martin 2011)
  2. Remove sequences with low Phred quality score (< 20)
  3. Remove duplicate sequences created by PCR.
magobu commented 6 years ago

For my project, I’ll be analyzing a set of eDNA data from a recently published article by Dr. Ryan Kelly’s SMEA team (they are an awesome group!!). This will be practice for when I get my own set of data this December.

The end goal of my project is to organize the original metadata file into either a table or FASTA format file with the OTU identifier, corresponding sequence, and the selected taxonomic level.

The first three steps will be:

  1. Merge paired-end reads with PEAR
  2. Quality filter with usearch
  3. Remove primers with cutadapt
Jeremyfishb commented 6 years ago

For my project, I will compare the relative protein expression level, and types of proteins expressed, between coral tissue and coral skeletal material in bleached and non-bleached corals. My collaborators and I have run mass spectrometry on the samples and converted them to .mzXML files and ran peptide- and protein-profit on them. However, My collaborators have moved the files to a directory that I can not access them. So...

My first step is to gain access to those files which I am working on setting up a meeting with my collaborators. For the next steps, I am unclear, but I imagine I will quality check and clean up the data, then I think the next step will be to start comparing differences in protein expression between the treatments and identifying each treatments' unique proteins and what those mean.

jgardn92 commented 6 years ago

My project will be using a data set of snailfish RADSeq data. My goal is to create a phylogeny based on the data. The first three steps will be:

  1. Get organized with a directory and determine the best way to get the file off google drive (curl doesn't seem to work). Create a checksum in case I ever have to download it again.
  2. Use Fastqc to visualize the data before doing anything to it.
  3. Trim low quality reads and visualize the results.
kimh11 commented 6 years ago

My first three steps of my pangolin genomics project is:

  1. Figure out if FASTQC, BWA, and FreeBayes are available on Hyak
  2. Figure out how to install tools I need on Hyak
  3. Get ssh access to our lab's NAS server & figure out how to move files between Mox and the server
hgloiselle commented 6 years ago

For my project I will first download the data, then:

  1. Check the quality of the data
  2. Align the data to the genome
  3. Look for SNPs
grace-ac commented 6 years ago

For my project, I'm going to:

  1. Run the .fastq files through FastQC
  2. Create a log-in for mox
  3. Use Trinity on Mox to assemble the transcriptome
  4. Use Jupyter notebook on my local computer to annotate transcriptome with BLAST