Post release improvements

sr320 commented 7 years ago

Last week you took to bold move to make your first release. For this week project-progress simply provide a bulleted explanation of all the improvements to your project since the release.

mfisher5 commented 7 years ago

Project Goals for this Week: (1) Repeat Week 5 comparisons made with ustacks -m parameter for the ustacks -M and cstacks # individs in catalog

(2) Determine which quantitative measurements should matter the most when comparing parameter outputs.

Progress (1) repeated Week 5 comparisons with the ustacks -M parameter (# of loci retained AND graphing heterozygosity with R code) as well as the cstacks # individs for catalog (# of loci retianed ONLY)

Jupyter notebook for step-by-step guide and code
Evernote notebook for graphical comparison

(2) Decided that I will most likely use a stack depth (-m) of 5 and and -M of 3. Have not thoroughly looked over the cstacks' parameter, # individs for catalog.

(3) Decided that I might also explore the upper / lower bound error rate parameters in ustacks (--bound_low and --bound_high)

(4) Tackling the question of whether the stacks program populations undercalls heterozygotes by running independent scripts on the output. So far have run into a lot of trouble here, as these scripts were written for a different version of stacks.

Jupyter notebooks I and II

yaaminiv commented 7 years ago

This week, I was mainly troubleshooting all of the problems I ran into last week.

Successfully got my blastx to run (and it's still running :sweat_smile: )
Got kallisto quant to work with my data
Setting up DeSeq2 analysis for next week

Ellior2 commented 7 years ago

Progress from this week:

1) Double checked to make sure my blast output was complete by checking to see if the tail end of output file contained contigs from my query. Here is my notebook

2) Turned my geoduck blast output that I merged with a Uniprot database into a large table

3) I also created a table with the proteins found in the Taylor oyster seed experiments after merging with the uniprot database.

nclowell commented 7 years ago

I figured out how to run sstacks, the penultimate program in the Stacks pipeline, and wrote a python script so that I can rerun it from the command line simply. The script writes and runs a shell script to accomplish this. Here's the markdown file.

In addition, I've made some progress on figuring out how to call filepaths so that I can streamline use no matter the directory structure. I've also made progress on sorting out what additional analyses and filtering my lab group does with RAD data to do after running populations, the final program in the Stacks pipeline.

aspanjer commented 7 years ago

For week 6:

My full transcriptome assembly finally ran in Galaxy, resulting in 350k contigs (up from the 170k contigs in the smaller assembly)
Using transrate, I ran an analysis on this new assembly
I used this new transcriptome to rerun the previously established analyses (Kallisto and DESeq2)
Additionally I blasted the new transcriptome against a database of salmon proteins and against the swissprot database (the later being used for GO enrichment analysis)
I began using DAVID for GO enrichment, but realized that the salmon Uniprot IDs aren't in the DAVID database, so had to go back and blast against the Uniprot reviewed database
I also worked on visualizing results, exploring the use of heat maps in different BioConductor packages, which I hope to have something usable from shortly.

mmiddleton commented 7 years ago

I spent this week trying to get to a point where I could run my Bismark analysis. I was able to:

Get the software and all it's dependencies downloaded and working
Read through the Bismark user manual (which is pretty extensive)
Start the bisulfite conversion of my genome (this step was started yesterday and will likely run through the weekend, so I am considering different options for running the mapping step like using CoGe or using a computer with more RAM/better processor than mine)

This week's accomplishments and goals for next week have been updated on my README

laurahspencer commented 7 years ago

Since my v01 "pre-release" I have done the following:

Progress Made

Learned how to use the command line to communicate with Git, since GitHub Desktop was severely malfunctioning. I now commit/push/pull using the command line!
Located the appropriate Uniprot-annotated geoduck transcriptome, used Galaxy to merge it with my data file to create one large and data-rich file.
Researched & sketched out the meaning of each column in the blast results file, Uniprot-transcriptome file, and the GFF format. Struggled trying to convert my large data file to a format (GFF or BED) that is viewable on the IGV tool, but after chatting w/ Steven I will now leave that effort until later.
Got a better handle on the end-goal/product
Prepped for transposable element analysis: registered for and downloaded the the RepeatMasker app and the required dependencies (I think I got them all). Will use Sean's blogpost as a guide.
Was thoroughly distracted and mortified by the election results (not progress, but still had to mention it...)

Next Steps

ID the transposable elements using RepeatMasker
Use CoGe to ID the CpG loci for potential methylation frequency/locations
Use online tools such as PITA or mirbase to try to ID likely miRNA sequences in the transcriptome sequences found on the >70k genome scaffolds
Identify the tool to use to ID lncRNA in my data

jldimond commented 7 years ago

I made a branch called Macbook_local for work done on a repository on my personal laptop.
I played around with some analyses in R.
Played with DESeq2 for the EpiRAD analysis and decided it is not appropropriate for this data.
I keep coming back to using residuals for the EpiRAD analysis.
I also played with the prcomp function for principal components analysis.

MeganEDuffy commented 7 years ago

Progress post release:

Finally got Homebrew and installed gdrive so I can now interact with my Google Drive directories to access files there.
Created a notebook for documenting protein database searching using Water's ProteinLynx Global Server (PLGS) and used a previously downloaded marine sediment metagenome to identify proteins in my samples (they were all hypothetical...)
Searched around quite a bit on iMicrobe Global Ocean Sampling Project to find new data that I can use to build a better database to search against.
Starting using blastp to identify de novo sequenced peptides.

sr320 / course-fish546-2016

Post release improvements #82