velocyto-team / velocyto.py

RNA velocity estimation in Python
http://velocyto.org/velocyto.py/
BSD 2-Clause "Simplified" License
159 stars 83 forks source link

Compatibility with non-string form for barcodes and UMIs #369

Open weir12 opened 1 year ago

weir12 commented 1 year ago

Hello velocyto maintainers,

I am working with a dataset where the barcodes (CB) and UMIs (UR) are not in the standard string sequence format. Specifically, my barcodes are represented as decimal integers (tag: CB:i:2242) and my UMIs are represented in hexadecimal (tag: UR:Z:7A9EB).

I am writing to ask if velocyto can handle such formats for barcodes and UMIs. If not, are there any workarounds or plans to accommodate these types of formats in the future?

I understand that the SAM specification typically expects barcodes to be in the string format (CB:Z:sequence_of_the_barcode). However, in the case of my data, the non-string representation is crucial and converting to string format would not be ideal.

Here's an example of an alignment from my BAM file:

FP200002485L1C062R04102086386 16 1 3000132 255 100M * 0 0 GACTATTGATGACTGCCTCTATTTCTTTAGGGGAAATGGGACTTTTAGTCCATGAATCTGATCCTGATTTAGCTTTGGTACCTGGTATCTGTCTAGGAAG EGGFG<FAGCGEGEFEEBDEDGEDFEBGHC@CEFDIAC+(DFFFDFFFDDGEFFFFBEFEFFDFFFEDGGGDEDBFD5EDEDEDDDEBEFECCEGDEFED NH:i:1 HI:i:1 AS:i:98 nM:i:0 Cx:i:51152 Cy:i:44492 UR:Z:7A9EB XF:i:2 CB:i:2242

Thank you for your assistance and for developing this valuable tool.

Best regards, weir