sr320 / course-fish546-2016

6 stars 5 forks source link

First steps in your project #26

Closed sr320 closed 7 years ago

sr320 commented 7 years ago

What do you imagine are the first three steps in your class project? Feel free to ask clarifying question in class or using issues..

yaaminiv commented 7 years ago

Reminder: I have transcriptomes of male and female O. lurida gonads from an ocean acidification experiment.

My first three (baby) steps for my class project:

  1. Understand my data! I haven't worked with these kinds of files before so I need to understand what the data means, why it's formatted the way it is, and how to open and manipulate it using various tools.
  2. Learn how to compare different transcriptomes side-by-side to see analyze differences between male and female gonads, and those exposed to different conditions.
  3. Learn how to connect parts of a transcriptome to genes and proteins.
jldimond commented 7 years ago
  1. Decide on the method I will use to separate out the host and symbiont sequences in my data. Initial attempts using a Porites astreoides transcriptome returned a very small number of loci. Then I thought filtering out the symbionts using the Symbiodinium minutum genome would be a good way to go. However, this is only a partial genome, and moreover I was surprised to read that the more recent Symbiodinium kawagutii genome matches only 5% of the S. minutum genome (!). These symbiotic dinoflagellates are therefore highly divergent from each other, and neither of these symbionts are even within the same clade as the symbionts I have detected in my samples. So, I think the best course of action will probably be to do a de novo assembly of the Porites lutea genome and use that...which leads me to:
  2. Assemble the Porites lutea raw reads on Cyverse.
  3. Run iPyrad using the "reference" assembly method with the newly assembled P. lutea genome.
hputnam commented 7 years ago

You might also be interested in another Porites transcriptome while the genome assembly is running. It is a holobiont transcriptome, so may not be optimal either. Shinzato et al 2014 Plos One Porites australiensis http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0085182 DDBJ Sequence Read Archive (DRA) under accession number DRA000906 (BioProject ID: PRJDB731)

jldimond commented 7 years ago

Thanks @hputnam

There is also another newly available transcriptome for P. astreoides but it is also a holobiont transcriptome. I suppose I could try filtering symbiont sequences out of it with the various Symbiodinium resources available, and combine it with the other P. astreoides transcriptome I used (from the Matz lab), as well as perhaps the P. australiensis one for good measure.

mmiddleton commented 7 years ago

Probably? first project steps:

  1. Learn how to deal with a fastq file. I've never had sequencing data before so I need to learn how to best prepare the data for analysis.
  2. I'm not sure that this is really step 2, but at some point I'll need to learn how to trim off the adapter sequences.
  3. Again, this probably isn't really step 3, but at some point I'll need to figure out how to get a hold of the O. mykiss reference genome (which I think has an update coming out soon?) so that I can compare my sequences to the reference genome.
mfisher5 commented 7 years ago

Working with RADseq data to find population structure...

  1. trim and quality-filter the raw data
  2. use de novo assembly to identify all of the loci in the data, and then map all of the reads to these loci
  3. identify the SNPs from the aligned reads at each locus

I'm a little familiar with the first few steps in stacks for single-read RADseq data, but since I have paired end data I was thinking it might be interesting to take the class project in a different direction than just population structure - while there isn't a Pacific cod genome, there is an Atlantic cod genome, so once I've done the de novo assembly and identified SNPs, it might be interesting to align this to the Atlantic cod genome and see how different / similar the species are.

aspanjer commented 7 years ago

Using RNAseq data from coho to conduct a differential expression analysis

  1. QC my sequencing data (trimomatic and fastQC)
  2. de novo assembly of transcriptome using trinity (might try a different assembler to compare results)
  3. Calculate gene expression for each of the 24 individuals (I used sailfish before, but might explore other options) with comparison to de novo assembled transcriptome.

Although I've ran through this once, I think I might get better results if I go back and spend some time cleaning up the sequencing data and try working with the PE data instead of merging it as I did the first time around.

Ellior2 commented 7 years ago

My goal is to characterize a Pacific oyster (Crassostrea gigas) proteome for this class.

Potential first steps: 1) Gain a full understanding of what my dataset represents. 2) Use Blast to identify matched proteins between a database and our query. 3) Further research the functions of these identified proteins.

I eventually will be analyzing differences in proteomes between P. oysters reared under different conditions so I think this will be important in getting "baseline" data or just familiar with different types of proteins that are typically expressed in these oysters.

laurahspencer commented 7 years ago

My project is to annotate a small portion of the geoduck genome sequenced by BGI. My first steps might be:

MeganEDuffy commented 7 years ago

My goal is to compare microbial metaproteomic depth profiles from both a traditional database searching strategy and de novo peptide sequencing. Ultimately I'll be comparing the numbers of peptide and proteins matched or sequenced (some quality control will have to come in here), and the resulting taxonomic and functional characterizations that each output leads to. My first steps are:

0) Obtain MS/MS spectra (I ran some samples last week and and was running more today but just decided to shut down the instrument in case we lose power...) Should be all done by next week. Once I have raw data (.RAW Waters file directories), I need to peak pick in Progenesis (Waters software) and convert to .mgf and .mzxML with msconvert.

1) Download search database (from assembled metagenome) from Rocap group server

2) Search .mzxML files with X!Tandem maybe via the TPP and run .mgf files with Novor CLI.

3) Figure out how to quality control the output - be it with PeptideProphet or something else.

4) Run peptide results through Unipept and MEGAN6for taxonomic and functional characterizations.

nclowell commented 7 years ago

Comparing population genetics of different cohorts of Pacific cod in Puget Sound using RAD data -- first few steps:

1) clean/filter raw reads 2) build/catalogue/align loci 3) call SNPs