r3fang / SnapTools

A module for working with snap files in Python
Apache License 2.0
35 stars 21 forks source link

creating snap files from another processing pipeline #16

Open dawe opened 5 years ago

dawe commented 5 years ago

I would like to process my scATAC data with snaptools and snapatac but, for several reasons, the BAM files I have are in a format which is different from the one required by snaptools (i.e. the cell barcode is not in the read name and it's a RG tag instead). In my BAM files, reads are already deduplicated at cell level (that is, the BAM flag is properly set). I already have counts over binned genome (at 5kb) in a scipy sparse matrix. What would be the best way to convert my matrices into snap format?

r3fang commented 5 years ago

Hi there,

There are multiple entry point that you can generate the snap file. The easiest way is to convert the bam file to the format that required by sanptools. Here is the example: https://github.com/r3fang/SnapATAC/wiki/FAQs#cellranger_output https://github.com/r3fang/SnapATAC/wiki/FAQs#cellranger_output

Best -Rongxin

On Sep 10, 2019, at 7:50 AM, Davide Cittaro notifications@github.com wrote:

I would like to process my scATAC data with snaptools and snapatac but, for several reasons, the BAM files I have are in a format which is different from the one required by snaptools (i.e. the cell barcode is not in the read name and it's a RG tag instead). In my BAM files, reads are already deduplicated at cell level (that is, the BAM flag is properly set). I already have counts over binned genome (at 5kb) in a scipy sparse matrix. What would be the best way to convert my matrices into snap format?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/r3fang/SnapTools/issues/16?email_source=notifications&email_token=ABT6GG6XBQ4UEI44PEX7BX3QI6X2ZA5CNFSM4IVIYAC2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HKPDH7Q, or mute the thread https://github.com/notifications/unsubscribe-auth/ABT6GG3WPPQPP5UVFHZY37LQI6X2ZANCNFSM4IVIYACQ.

dawe commented 5 years ago

Yes, I did that as it sounds the only way to go. Still I’d rather not use read-sorted bam (it wastes disk space and time in sorting, IMHO), my BAM files already have barcode names in their header and read groups, so the size of the resulting matrix is known from the start and reads are already D.E. duplicated at cell level.

fairliereese commented 5 years ago

Just commenting that this is how the new BioRad scATAC protocol works and it would be really great to see support for this in the future!