TEcount and 10X Genomics single Cell data support?

This is a wish list item.

Have you given any thought to TEcount support for 10X Genomics single cell data?

As single-cell work becomes more commonplace, it would be nice to incorporate TEcount analysis.

As you may know, the 10X 3' chemistry generates read sequences (in Illumina R2 fastq) associated with cellId and UMI (unique molecular index) barcode data (in Illumina R1 fastq).

The standard 10X cellranger analysis generates counts for genes from a gene model, with duplicates removed and each distinct cell getting its own count. Along the way a STAR-generated position sorted bam is produced (I believe with tags indicating the cell ID).

At a minimum, the STAR parameters would probably have to be modified to allow for highly multi-mapped reads to be output. But beyond that, some clever coding would be necessary to assign UMI-aware, dup-removed counts of TE_rmsk elements to the cellIds. For each TE, a row vector would have to be added to the standard (gene-model based) matrix of counts.

I am curious to know if this is anything you have plans for.

Thanks, Sol Katzman UCSC Genomics Institute.

mhammell-laboratory / TEtranscripts

TEcount and 10X Genomics single Cell data support? #60