lkov0 / bladderwort-analysis

Documentation for work on the bladderwort grant. This repository serves as a workflow for the whole analysis.
1 stars 0 forks source link

Bladderwort Project #1

Open lkov0 opened 5 years ago

lkov0 commented 5 years ago

These are all of my notes for the Bladderwort** project from 08-28-2019 forward.

Summary:

In this study, we aim to identify insulator and terminator elements in the Utricularia gibba (bladderwort) genome. This organism is an exceptional model for CRE detection due to its extremely small genome size. The U. gibba genome will be PacBio sequenced and assembled. RNA-seq data will then be used to detect pairs of independently expressed genes in the same genotype. Intergenic regions between independently expressed gene-pairs will subsequently be used for CRE detection. After CREs are detected, they will be validated by a collaborator. A phylogenetic analysis willthen follow in which putative insulator and terminator elements will be gauged for conservation across angiosperms.

Goals:

lkov0 commented 5 years ago

Genome assembly

Quast results:

- Since this assembly is not the main point of this project, and keeping small contigs in the assembly will not affect my final results, I will leave them in. Summary of Quast results (spades assembly filtered for only _U. gibba_ scaffolds: ![Screenshot from 2019-09-06 13-45-56](https://user-images.githubusercontent.com/46690580/64448912-bddf6f80-d0ac-11e9-94c9-5e495dc454b4.png) Summary of Quast results (Published illumina genome compared to published pacbio genome): ![Screenshot from 2019-09-06 13-47-26](https://user-images.githubusercontent.com/46690580/64448996-f2ebc200-d0ac-11e9-9305-b2eb77cb1f3f.png) #### Summary: - There are less complete genes in our assembly than in the original assembly. This is to be expected since the original "Illumina" assembly also included 454 reads. The incorporation of 454 reads would allow for a more contigious assembly. - There are also a lot of rearrangements (or misassemblies) in our genome compared to the illumina reference when compared to the pacbio assembly. This isn't suprising given that the pacbio assembly would include sequence missed by assembling with only illumina data, so most of these rearrangements were probably caused in silico and are in the form of repeat expansions or contractions. (assemblytics result confirms this.): ![strVariant_counts](https://user-images.githubusercontent.com/46690580/64449740-9db0b000-d0ae-11e9-81f5-1502b4e371a2.png) - There are probably enough genes / intergenic regions to get CRE candidates. Will proceed with annotation.
lkov0 commented 5 years ago

Genome annotation

First round of maker annotation: