pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
146 stars 23 forks source link

unclear which genomic fasta to use #205

Closed yfarjoun closed 1 year ago

yfarjoun commented 1 year ago

in ensembl there are 3 different "top-level" genomic fastas (for human atleast):

unmasked, hard-masked and soft-masked:

image

I presume that these will result in different indexes and thus different pseudo alignments and results. Could you clarify which "flavor" of genomic fasta will be most appropriate for creating an index with?

Yenaled commented 1 year ago

I don't use masked genomes. I use the unmasked Ensembl genome or the GENCODE genome.

Also see https://kb.10xgenomics.com/hc/en-us/articles/360060307872-Does-my-genome-sequence-needs-be-masked-or-unmasked-for-custom-reference-generation-

yfarjoun commented 1 year ago

thanks.