Final Filetypes - Githubissues

sr320 commented 7 years ago

For this week's project-progress list out the ultimate files you will publish as part of your class product in a bulleted list. Include a brief description of the files and indicate filetype.

yaaminiv commented 7 years ago

Based on the goals listed in my repository, I will publish the following files as part of my class project:

.tab file with genes differentially expressed in control vs. ocean acidification conditions and accompanying gene ontology information. Ideally, this file will also highlight variations in differential expression based on gonad sex. The .tab file will be based off of the following information that I will also publish:
- .tab file with genes differentially expressed in control vs. ocean acidification conditions
- .tab file with genes differentially expressed in male vs. female gonads
- .txt file with best matches and gene ontology information for all acensions
.png that visually displays information in the .tab file

jldimond commented 7 years ago

@sr320 what do you envision to be the sorts of files that should be published? Just raw data (or links to them) and the workflows used to generate subsequent files for analysis, or do you think the final project should include some of these analysis files?

Ellior2 commented 7 years ago

For my final project of characterizing a Pacific Oyster proteome I plan to create the final output files:

My original goals were to:

1) Identify proteins and their functions in C. gigas proteome

TAB file with protein names, GO terms, e-values, etc.
JPG with visualization from Revigo based on GO terms.

2) Compare an oyster proteome to another bivalve- the geoduck

CSV file with table of proteins and GO terms that are shared between these two bivalve species
TAB files with unique proteins specific to each organism
JPG that visualizes the data

3) Draw conclusions about differential protein expression in oysters reared at 23C and 29C from 2015 MS/MS data.

TAB file listing unique proteins and their function between the two treatments.
TAB file with protein expression levels (in this case peak area).
JPG that visualizes differences in protein expression between these two treatments

jldimond commented 7 years ago

My final files will mostly be iPyrad output files for the "data3" assembly that I ran. This is basically the third iteration of the iPyrad run that I ended up using as my final dataset.

.vcf is large file with variant and read depth information for each base. I used it to derive the file data3-2.txt which I used for the EpiRAD analysis. .geno is a matrix of alleles that I used for making MDS plots .loci is a big file that basically shows all the stacks and the actual bases .phy is a file type I did not use, but is basically a supermatrix of the .loci file .snps.map provides indexing information for all the loci and SNPs .str is a matrix of alleles in a format that is ideal for the program STRUCTURE, but I used it in an R package called adegenet to do discriminant analysis of principal components, which is similar to what STRUCTURE does

Several of these files have unlinked SNP counterpart files denoted .u., e.g. .u.str. I focused my analysis on just the unlinked SNPs, so the .u.geno and .u.str files are the ones I used for the ddRAD analysis.

Further information can be found here: http://ipyrad.readthedocs.io/output_formats.html

mmiddleton commented 7 years ago

For my final product I plan to publish four files:

a .bam file which is the output from the alignment step of Bismarkmapping my sequence data to the reference genome. However, this particular file is more or less useless without the methylation extraction step of Bismark.
a .cov file which is basically a text file created during the methylation extraction step of Bismark that has information about whether or not cytosines are methylated, location on the chromosome/scaffold, and what percentage of the cytosines are methylated/unmethylated.
a .bed file which is another output from the methylation extraction step of Bismark that I can open using a viewer (IGV) so that I can actually see my methylation data.
a .tab or .txt file that I will make which will (hopefully) contain some interesting results from searching through the methylation data and using BLAST to find some heavily methylated/unmethylated genes.

laurahspencer commented 7 years ago

If all goes well, my final product should include:

.scafSeq file (fasta format) with a subset of DNA sequences on scaffolds >70k bp
.tabular file with results from blasting transcriptome against >70k scaffolds & merged with Uniprot data, indexed
.gff file with candidate transposable elements, identified via RepeatMasker, indexed
.gff file with candidate miRNA locations, identified using the miRBase hairpin sequences, indexed
.gff file with candidate CpG sites, located via Galaxy's EMBOSS fuzznuc online tool, indexed
.gff file with RNASeq expression reads, indexed
.xmlfile for IGV visualization

FYI @sr320 this weekend/week I'm focusing on the RNASeq step, and am a little fuzzy on how to do this to completion, but am using your Oly project repo as guidance/template.

mfisher5 commented 7 years ago

My final product will include:

.sam file that contains my reference genome, built from a combination of stacks, bowtie, and blast.
.genepop file, a matrix of every individual's genotype at every locus that can later be used in GENEPOP to calculate linkage disequilibrium, estimate Nm, among other metrics.
.sumstats.summary.tsv file, which contains a summary of all the summary statistics for each population. This includes mean observed / expected heterozygosity across variable and all loci, as well as a measure of Fis.
fst.tsv file, which contains Fst calculations for each pair of populations.

aspanjer commented 7 years ago

For my final product:

.fasta file that contains the final assembled transcriptome using all of the sequencing files for all individuals.
.tabular file that contains the final annotation for the transcriptome and corresponding go ontology
.tabular file containing differentially expressed genes for different comparisons
.jpg file that visualizes the overall differential expression
.jpg file that visualizes gene ontology enrichment results (might be a table)
.md file that has the methods and results section for publication

nclowell commented 7 years ago

For my final product, I will produce:

a .sam file for my cleaned catalog/de novo reference genome
a .genepop file which has genotypes for all loci of all individuals
.sumstats.summary.tsv file and fst.tsv which will provide statistical estimates of Fst, Fis, etc, between cohorts (or "populations" here)
plots in R or GENEPOP using the .genepop file to visualize population differences

MeganEDuffy commented 7 years ago

My final filetypes will be:

.csv files with lowest common ancestor metapeptide results
.jpg files with plots showing lcs results
.fasta files with predicted protein results.
hopefully (I should know when I get protein analyses that are running!) a .tab with GO terms for predicted proteins.

sr320 / course-fish546-2016

Final Filetypes #94