ksahlin / BESST_RNA

Scaffolding of genomic assemblies with RNA seq data
15 stars 1 forks source link

Mapper #4

Open colindaven opened 8 years ago

colindaven commented 8 years ago

Hi, maybe I missed this, but which aligner is recommended ? STAR or Tophat, or bwa, bowtie2 etc.

Thanks, Colin

ksahlin commented 8 years ago

It is my assumption that RNA-seq specific aligners such as STAR and Tophat will have an advantage over genomic sequence aligners because of the extra modeling of split reads etc. However, BESST_RNA does not utilize such additional information, e.g., split reads aligning to different contigs. BESST_RNA considers only uniquely mapped pairs of reads (both reads within a pair) with some minimum mapping quality (can be set as a parameter --mapq ).

So, whichever mapper that will provide the most unique correct read pairs mapped will be the best mapper for BESST_RNA. I remember having a request about parsing split aligned reads (provided e.g., by STAR aligner), but I never got implemented it because it required some significant restructuring of the code to make use of the information within the scaffolding.

colindaven commented 8 years ago

Ok, thanks, I will give both STAR and BWA MEM a try.

colindaven commented 8 years ago

BWA Mem worked very well. BESST ran efficiently, the whole process taking about 10-20 minutes for a mid size ~2gb plant genome assembly.

From ~500k contigs (very early assembly, only Illumina data at present) about 7000 could be scaffolded using 500k of ~ 10 million reads in the BAM.

ksahlin commented 8 years ago

Ok, great!

Do you have illumina mate pairs or "long insert size" paired-end reads for the genomic assembly? In that case, I encourage you to try the genomic data version of BESST (also found under my repositories).

Best, K