sr320 / LabDocs

Roberts Lab Documents
http://sr320.github.io/LabDocs/
9 stars 17 forks source link

Olympia oyster genome gap-filling w/PacBio data #554

Closed kubu4 closed 7 years ago

kubu4 commented 7 years ago

We have new PacBio data for Olympia oyster: http://owl.fish.washington.edu/nightingales/O_lurida/20170323_pacbio/

We sequenced 10 SMRT cells. As such, there is a subdirectory for each SMRT cell's data.

Here's the PacBio software recommendations page: https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/Large-Genome-Assembly-with-PacBio-Long-Reads

I think we might as well try the Gap Filling suggestion (using PBJelly). There isn't too much documentation, but I think you can figure it out. After reading the "JellyReadme.txt" file, look at the two .xml files (one in the jellyExample folder and the TemplateProtocol.xml) that are provided to get an idea of what you'll need.

For the PacBio input data, I think you'll use the filtered_subreads.fasta which are found in the top level of each of the SMRT cell directories (Note: The files we have are gzipped and I don't think PBJelly will accept gzipped input files).

For the existing Olympia oyster reference, we have two options (Note: either of the following files will need to be renamed - see the PBJelly readme for explanation):

  1. Scaffold assembly: http://owl.fish.washington.edu/O_lurida_genome_assemblies_BGI/20160314/scaffold.fa.fill

  2. Contig assembly: http://owl.fish.washington.edu/O_lurida_genome_assemblies_BGI/20161201/cdts-hk.genomics.cn/Ostrea_lurida/Ostrea_lurida.fa

I'll let you figure out which to try it with (or, heck, try 'em both)!

seanb80 commented 7 years ago

Working on this on Hyak, it's nominally functional in it's current state, but having difficulties with the walltime argument in sbatch, so the job kills itself after an hour. Still in progress.

seanb80 commented 7 years ago

Currently running with a 10 day time request. Will update when finished.