sr320 / course-fish546-2016

6 stars 5 forks source link

Final Filetypes #94

Closed sr320 closed 7 years ago

sr320 commented 7 years ago

For this week's project-progress list out the ultimate files you will publish as part of your class product in a bulleted list. Include a brief description of the files and indicate filetype.

yaaminiv commented 7 years ago

Based on the goals listed in my repository, I will publish the following files as part of my class project:

jldimond commented 7 years ago

@sr320 what do you envision to be the sorts of files that should be published? Just raw data (or links to them) and the workflows used to generate subsequent files for analysis, or do you think the final project should include some of these analysis files?

Ellior2 commented 7 years ago

For my final project of characterizing a Pacific Oyster proteome I plan to create the final output files:

My original goals were to:

1) Identify proteins and their functions in C. gigas proteome

2) Compare an oyster proteome to another bivalve- the geoduck

3) Draw conclusions about differential protein expression in oysters reared at 23C and 29C from 2015 MS/MS data.

jldimond commented 7 years ago

My final files will mostly be iPyrad output files for the "data3" assembly that I ran. This is basically the third iteration of the iPyrad run that I ended up using as my final dataset.

.vcf is large file with variant and read depth information for each base. I used it to derive the file data3-2.txt which I used for the EpiRAD analysis. .geno is a matrix of alleles that I used for making MDS plots .loci is a big file that basically shows all the stacks and the actual bases .phy is a file type I did not use, but is basically a supermatrix of the .loci file .snps.map provides indexing information for all the loci and SNPs .str is a matrix of alleles in a format that is ideal for the program STRUCTURE, but I used it in an R package called adegenet to do discriminant analysis of principal components, which is similar to what STRUCTURE does

Several of these files have unlinked SNP counterpart files denoted .u., e.g. .u.str. I focused my analysis on just the unlinked SNPs, so the .u.geno and .u.str files are the ones I used for the ddRAD analysis.

Further information can be found here: http://ipyrad.readthedocs.io/output_formats.html

mmiddleton commented 7 years ago

For my final product I plan to publish four files:

laurahspencer commented 7 years ago

If all goes well, my final product should include:

FYI @sr320 this weekend/week I'm focusing on the RNASeq step, and am a little fuzzy on how to do this to completion, but am using your Oly project repo as guidance/template.

mfisher5 commented 7 years ago

My final product will include:

  1. .sam file that contains my reference genome, built from a combination of stacks, bowtie, and blast.
  2. .genepop file, a matrix of every individual's genotype at every locus that can later be used in GENEPOP to calculate linkage disequilibrium, estimate Nm, among other metrics.
  3. .sumstats.summary.tsv file, which contains a summary of all the summary statistics for each population. This includes mean observed / expected heterozygosity across variable and all loci, as well as a measure of Fis.
  4. fst.tsv file, which contains Fst calculations for each pair of populations.
aspanjer commented 7 years ago

For my final product:

  1. .fasta file that contains the final assembled transcriptome using all of the sequencing files for all individuals.

  2. .tabular file that contains the final annotation for the transcriptome and corresponding go ontology

  3. .tabular file containing differentially expressed genes for different comparisons

  4. .jpg file that visualizes the overall differential expression

  5. .jpg file that visualizes gene ontology enrichment results (might be a table)

  6. .md file that has the methods and results section for publication

nclowell commented 7 years ago

For my final product, I will produce:

  1. a .sam file for my cleaned catalog/de novo reference genome
  2. a .genepop file which has genotypes for all loci of all individuals
  3. .sumstats.summary.tsv file and fst.tsv which will provide statistical estimates of Fst, Fis, etc, between cohorts (or "populations" here)
  4. plots in R or GENEPOP using the .genepop file to visualize population differences
MeganEDuffy commented 7 years ago

My final filetypes will be: