sr320 / course-fish546-2016

6 stars 5 forks source link

Post release improvements #82

Closed sr320 closed 7 years ago

sr320 commented 7 years ago

Last week you took to bold move to make your first release. For this week project-progress simply provide a bulleted explanation of all the improvements to your project since the release.

mfisher5 commented 7 years ago

Project Goals for this Week: (1) Repeat Week 5 comparisons made with ustacks -m parameter for the ustacks -M and cstacks # individs in catalog

(2) Determine which quantitative measurements should matter the most when comparing parameter outputs.

Progress (1) repeated Week 5 comparisons with the ustacks -M parameter (# of loci retained AND graphing heterozygosity with R code) as well as the cstacks # individs for catalog (# of loci retianed ONLY)

(2) Decided that I will most likely use a stack depth (-m) of 5 and and -M of 3. Have not thoroughly looked over the cstacks' parameter, # individs for catalog.

(3) Decided that I might also explore the upper / lower bound error rate parameters in ustacks (--bound_low and --bound_high)

(4) Tackling the question of whether the stacks program populations undercalls heterozygotes by running independent scripts on the output. So far have run into a lot of trouble here, as these scripts were written for a different version of stacks.

yaaminiv commented 7 years ago

This week, I was mainly troubleshooting all of the problems I ran into last week.

Ellior2 commented 7 years ago

Progress from this week:

1) Double checked to make sure my blast output was complete by checking to see if the tail end of output file contained contigs from my query. Here is my notebook

2) Turned my geoduck blast output that I merged with a Uniprot database into a large table

3) I also created a table with the proteins found in the Taylor oyster seed experiments after merging with the uniprot database.

nclowell commented 7 years ago

I figured out how to run sstacks, the penultimate program in the Stacks pipeline, and wrote a python script so that I can rerun it from the command line simply. The script writes and runs a shell script to accomplish this. Here's the markdown file.

In addition, I've made some progress on figuring out how to call filepaths so that I can streamline use no matter the directory structure. I've also made progress on sorting out what additional analyses and filtering my lab group does with RAD data to do after running populations, the final program in the Stacks pipeline.

aspanjer commented 7 years ago

For week 6:

mmiddleton commented 7 years ago

I spent this week trying to get to a point where I could run my Bismark analysis. I was able to:

This week's accomplishments and goals for next week have been updated on my README

laurahspencer commented 7 years ago

Since my v01 "pre-release" I have done the following:

Progress Made

  1. Learned how to use the command line to communicate with Git, since GitHub Desktop was severely malfunctioning. I now commit/push/pull using the command line!
  2. Located the appropriate Uniprot-annotated geoduck transcriptome, used Galaxy to merge it with my data file to create one large and data-rich file.
  3. Researched & sketched out the meaning of each column in the blast results file, Uniprot-transcriptome file, and the GFF format. Struggled trying to convert my large data file to a format (GFF or BED) that is viewable on the IGV tool, but after chatting w/ Steven I will now leave that effort until later.
  4. Got a better handle on the end-goal/product
  5. Prepped for transposable element analysis: registered for and downloaded the the RepeatMasker app and the required dependencies (I think I got them all). Will use Sean's blogpost as a guide.
  6. Was thoroughly distracted and mortified by the election results (not progress, but still had to mention it...)

Next Steps

  1. ID the transposable elements using RepeatMasker
  2. Use CoGe to ID the CpG loci for potential methylation frequency/locations
  3. Use online tools such as PITA or mirbase to try to ID likely miRNA sequences in the transcriptome sequences found on the >70k genome scaffolds
  4. Identify the tool to use to ID lncRNA in my data
jldimond commented 7 years ago
MeganEDuffy commented 7 years ago

Progress post release: