timoast / sinto

Tools for single-cell data processing
https://timoast.github.io/sinto/
MIT License
118 stars 25 forks source link

Zero-length fragments generated from Cell Ranger BAM #20

Closed cflerin closed 3 years ago

cflerin commented 4 years ago

Hi, thanks for making this tool!

I've come across this issue and I'm not sure if this is the expected behavior or not. I'm using Sinto 0.7.1 to create a fragments file from a Cell Ranger bam file. In the output, I get many fragments with the same start/end position (around 6000 in total). For example:

chr5    49658161        49658162        CGCACAGCACCTATTT-1      1
chr5    49658161        49658164        GATTGACCACGTTGTA-1      2
chr5    49658161        49658168        TGTGTCCGTATTGTCG-1      1
chr5    49658162        49658162        CTCTACGCAAAGGTCG-1      1 # <--- this fragment
chr5    49658162        49658168        CCGTACTCACACACAT-1      2
chr5    49658162        49658173        GTGGATTCAGCAACAG-1      1
chr5    49658166        49658432        CTGAATGAGGACTAGC-1      2
chr5    49658168        49658168        CACCTTGAGCCTGTAT-1      3

When comparing to the Cell Ranger fragments file from the same bam, I don't see any of these. From Cell Ranger, the minimum fragment size seems to be 10, so maybe it has been filtered. Should I filter the Sinto fragments as well?

timoast commented 4 years ago

Hi, I think filtering fragments based on a minimum length is definitely a good idea, I hadn't noticed any of these very short fragments in the test cases I looked at. I can add a minimum fragment length argument to the next version, similar to how we have the --max_distance parameter

cflerin commented 3 years ago

Hi @timoast, thanks for the reply. For now, I can work around this by just filtering for a minimum fragment size during the filtering step:

sort -k1,1 -k2,2n fragments.bed | awk '($3-$2) >= 10' | bgzip -c > fragments.tsv.gz

So, feel free to close this, unless you're planning on adding the filtering step in a later release.

timoast commented 3 years ago

I'll leave this open until an option is added to sinto to filter small fragments

timoast commented 3 years ago

Now added in 0.7.2