sridnona / cb_sniffer

mutation(barcode) caller for 10x single cell data
GNU General Public License v3.0
44 stars 16 forks source link

Variant mapping to tens of thousands of barcodes #4

Closed s2hui closed 5 years ago

s2hui commented 5 years ago

Hello,

I have been running cb_sniffer successfully however have run into a few variants where they seem to map to tens of thousands of read (barcodes). It looks like the program is hanging but it is simply taking hours (3+ hrs) to map such variants.

For ex: ('12', 14882062, '3_prime_UTR_variant', 'MGP', 0)

Resulted in over 95K UBs: grep 14882062 AVM_sample_26092018_vep93_counts_UB.tsv | wc -l 95645

(base) nfbiomac-01:cbsniffer hshirley$ grep 14882062 AVM_sample_26092018_vep93_counts_UB.tsv 12 14882062 14882062 T A 3_prime_UTR_variant MGP AACTGGTTCCACGTTC-1:TTCTACTGAC 0 1 1 12 14882062 14882062 T A 3_prime_UTR_variant MGP CGGGTCACACAGACTT-1:GGGCCTTGCC 0 6 6 12 14882062 14882062 T A 3_prime_UTR_variant MGP GGGAATGAGGCAGGTT-1:AGTTCACCGC 0 11 11 12 14882062 14882062 T A 3_prime_UTR_variant MGP GGCAATTTCCCATTTA-1:GCGCCACGGG 0 4 4 12 14882062 14882062 T A 3_prime_UTR_variant MGP TCAGATGGTAGTACCT-1:CGCAGCAATG 0 4 4 12 14882062 14882062 T A 3_prime_UTR_variant MGP CACCACTGTCATCCCT-1:AATGTTCCCT 0 6 6 12 14882062 14882062 T A 3_prime_UTR_variant MGP GATTCAGCAGAGCCAA-1:ACTATCTATA 0 4 4 12 14882062 14882062 T A 3_prime_UTR_variant MGP CTCGTACAGCTTATCG-1:GCATTACAAG 0 11 11 12 14882062 14882062 T A 3_prime_UTR_variant MGP ACTGATGTCGATGAGG-1:GCAAGGCGGG 0 7 7 12 14882062 14882062 T A 3_prime_UTR_variant MGP CCATGTCAGTTATCGC-1:TCTTTAGACC 0 4 4 12 14882062 14882062 T A 3_prime_UTR_variant MGP GCGCCAAGTCTGGAGA-1:CCTTTTTGCG 0 8 8 12 14882062 14882062 T A 3_prime_UTR_variant MGP TGCACCTCAAAGCAAT-1:TCGTCACAAT 0 10 10 12 14882062 14882062 T A 3_prime_UTR_variant MGP GTCATTTCAGCTGTAT-1:TTTTGTGAAC 0 17 17 12 14882062 14882062 T A 3_prime_UTR_variant MGP AAAGATGGTTAAGAAC-1:TCAATACTTA 0 9 9 12 14882062 14882062 T A 3_prime_UTR_variant MGP GGGATGAAGTGAACGC-1:TCGGGGTCTG 0 8 8 12 14882062 14882062 T A 3_prime_UTR_variant MGP TGACGGCTCTGGCGAC-1:AGGTTTGATT 0 30 30 12 14882062 14882062 T A 3_prime_UTR_variant MGP ATCGAGTTCTGCCCTA-1:AAGTTCTAAT 0 5 5 12 14882062 14882062 T A 3_prime_UTR_variant MGP GGATTACAGAATAGGG-1:CCATCTCGGC 0 1 1 12 14882062 14882062 T A 3_prime_UTR_variant MGP GCCTCTAAGACAGAGA-1:GAGATAACAT 0 5 5 12 14882062 14882062 T A 3_prime_UTR_variant MGP GGGAATGAGCTCCTCT-1:GATTCATCAA 0 13 13 12 14882062 14882062 T A 3_prime_UTR_variant MGP AGCGTATGTAGTGAAT-1:CTTCCGCCGA 0 20 20 12 14882062 14882062 T A 3_prime_UTR_variant MGP TGCGCAGAGCGATGAC-1:TCATGAGGAT 0 6 6 12 14882062 14882062 T A 3_prime_UTR_variant MGP ATTTCTGGTGACCAAG-1:GTAGCAGGCA 0 1 1

Most if not all of these are alt allele variants.

I was just wondering if this normal?

Thanks for your help, shui

sridnona commented 5 years ago

Hello @s2hui

We occasionally have seen this for some reason the depth at that location is really high. Can you load that variant position in IGV and see what the depth is?

Thanks. Sid

s2hui commented 5 years ago

According to the Coverage Track, for this variant the total counts is over 82K, which is quite high I believe. Would this explain the high number of mappings?

Thanks for your insight, shirley

sridnona commented 5 years ago

Very first guess would point towards mapping artifact, one way to also check this would to see if MGP gene expression in all cells. This can be done using Featureplot from seurat if it is highly expressed in all the cells then there is nothing wrong with the analysis and i would wait for the process to complete, but if there is no expression then i would call it as mapping artifact and ignore the variant.

My second guess would be 5' kit or 3' kit? if latter then its possible you will get more coverage at 3'

Thanks Sid

s2hui commented 5 years ago

Hi @sridnona, I have used FeaturePlot to plot MGP and it is expressed everywhere. So I guess it is not an artifact and I should just wait for the process to complete... Thanks for your help! shui

sridnona commented 5 years ago

Closing this feel free to reopen if you have additional questions