Open lauradmartens opened 6 months ago
I'm not sure how to treat split reads. Suppose we have split reads as follows:
Read1: chr1, 200, 400 Read2: chr1, 500, 600 Split read: chr1, 2000, 2100
What are the fragments supposed to be generated here?
That's a good point! I guess I thought of it rather in the context of RNA coverage rather than ATAC because it would be nice to also use your storage format + functionality for that.
Hi there. I worked with @lauradmartens on this the other day and we came up with a proof of concept https://github.com/kaizhang/SnapATAC2/compare/main...manzt:SnapATAC2:split-reads
This seems to work well for our use case: generating fragments .bed
for the RNA coverage, and then loading the fragments in AnnData with import_fragments
. Props to the API design that this just works! Would you have any interest in supporting in extending the current API to support our use case? We would like to contribute if possible, since having this functionality in snapatac2 (python) would be great for some of our collaborators.
It seems specialized enough that I'm not sure whether adding a flag to make_fragment_file
(e.g., expand_split_reads
) or introducing a specialized function (e.g., make_rna_coverage_fragment_file
) would be desirable. Thanks for your consideration!
Thanks! I'll take a look at your implementation this weekend, and decide how to proceed.
The approach you are taking works when the reads are single-ended and your goal is to generate bigwig coverage file. To make it works for paired-end reads as well and to be compatible with other functions, I'm thinking to store these reads in a new slot in obsm
, e.g., 'split_reads'. The downstream functions can then choose to use it or not.
Hi, first of all thanks a lot for this great tool!
I was wondering if it would be possible to have the make fragment files function from bam files also consider split reads in addition to clipped reads? Do you think this would be difficult to implement?
Thanks a lot for your help!