Open mellamosummer opened 2 years ago
This week I inspected the output files from Get Organelle & discussed the results with my advisor. The output file was incomplete, and I tried troubleshooting by changing some parameters based on the FAQ (https://github.com/Kinggerm/GetOrganelle/wiki/FAQ#how-to-assemble-a-target-organelle-genome-using-my-own-reference) but got stuck. A lab mate recommended I try NovoPlasty (https://github.com/ndierckx/NOVOPlasty) in place of GetOrganelle. I've been troubleshooting that portion of the script this week and finally got it to run.
I ran into an issue with Novoplasty -- even when I specified the directory output in the configuration file it only saves files to the directory where the job submission script was submitted. I had to manually move to /scratch to get it to run. I also ran into the same issue with Meraculous.
Both scripts are running currently, so I hope to have results to check next week!
After trying to assemble the chloroplast from WGSS data for G. maculatum using Novoplasty, I was still unable to get a complete, circularized assembly. This week, I wrote a script to map the reads to the number 1 BLAST hit (G. incanum complete plastome) from the contigs from the assembler. I visualized the coverage in IGV, which I plan to discuss with my advisor.
I was able to troubleshoot the scripting issues with Meraculous, but I've run into another issue with the software now (not my configuration file). It seems like there is a dependency issue that stops the script during the "mercount" stage. I've asked the GACRC to upload the software that supersedes Meraculous, HipMer, so that I can try that software instead next week.
Below is the visualization from Bandage from GetOrganelle. I could actually use some help understanding this visualization better, other than just seeing that the assembly is not completely circularized.
I visualized read coverage in IGV & there was pretty even ~40x coverage, which puzzled my lab & I considering our largest contig from the assembly was ~13,000bp (which is only ~10% of the plastome based on the reference sequence). There were a few gaps that may have affected the assembly, but it seems unlikely that this is the main reason since the coverage shows regions of ~30,0000bp without gaps. I pulled a gff from ncbi to visualize which genes the coverage gaps are located & plan to discuss with my lab & PI next week (see images below).
I also re-ran Novoplasty with a reference AND seed to see if this improves the assembly. I also am testing one more chloroplast assembler, FastPlast.
I reached out to the GACRC to troubleshoot meraculous, and it turns out that there is a version loaded that is missing dependencies. They told me to use a different version, and the script is running fine now! I also changed the "min depth" parameter because I still had an error after fixing the dependency issue. It's currently running now.
I've been spending this week writing up my report & trying to make sense of our plastome results! I have also re-started a meraculous run with more threads after seeing the slow progress over this week. The job is currently running (much faster than before!). I'm hoping that it will be finished by Tuesday!
Plan for your class project: We recently generated Illumina shotgun sequence data for the non-model plant organism (my study system), Geranium maculatum. I plan to evaluate genome structure with a K-mer analysis, and assemble the plastome & nuclear genome.
Scripts developed: https://github.com/mellamosummer/G_maculatum_novogene/blob/b1b33c844cd09fc139e22747883da07875612990/scripts/G_maculatum.sh 1) Trimmomatic
complete
2) FastQC & MutiQC pre & post trimcomplete
3) Get Organelle Plastome assemblycomplete
4) Jellyfish & GenomeScope K-mer analysiscomplete
5) Assemble Nuclear genome with MeraculousTesting script now