usegalaxy-eu / usegalaxy-eu-tools

List of tools included in https://usegalaxy.eu
MIT License
26 stars 127 forks source link

STAR indexes contain alternative haplotypes #331

Open mblue9 opened 4 years ago

mblue9 commented 4 years ago

Hello,

I'm helping some researchers analyse some zebrafish data and just discovered the zebrafish STAR index (danRer11) in EU contains the alternative haplotypes. Afaik the alts shouldn't be included, see STAR manual here https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf

Generally, patches and alternative haplotypes should not be included in the genome.

as it means reads that map to those regions will get false low mapping scores and discarded. In the samples I'm looking at it seems to be ~10% mapping to the alts.

Is it possible to remove the alts from the indexes?

Human hg38 STAR looks like it also contains the alts.

bgruening commented 4 years ago

:( not good. We have been using the regular UCSC genomes as we do always :(

mblue9 commented 4 years ago

:( not good. We have been using the regular UCSC genomes as we do always :(

Thanks for the reply! I know it's extra hassle :( but would it be possible to make a version that excludes the alts from the UCSC genomes before indexing?

My understanding is that most people would want the version without alts for STAR (and most other aligners) as Devon Ryan says in this post.

https://www.biostars.org/p/330596/

STAR manual recommends to exclude haplotypes and patches from reference genome while keeping unplaced scaffolds when aligning RNA-seq reads. Is the same recommended for aligning ChIP-seq reads with Bowtie2? Do Bowtie2 index take that into account?

The recommendations for STAR apply to all aligners except BWA mem and novoalign. This is also true for all types of sequencing experiments.

wm75 commented 4 years ago

Related blog post of Heng Li on the issue with the human genome: https://lh3.github.io/2017/11/13/which-human-reference-genome-to-use. So maybe we could even optimize hg19 a bit?