timoast / sinto

Tools for single-cell data processing
https://timoast.github.io/sinto/
MIT License
118 stars 25 forks source link

filter barcodes parallel issue #15

Closed omansn closed 4 years ago

omansn commented 4 years ago

Hi Tim,

Thought I'd point out an issue that I noticed. I can't figure out exactly what is happening. Basically, when I run filterbarcodes with -p >1 each certain header entries are duplicated once for every process. So Every @PG entry is duplicated but with a unique string in the ID name.

@PG ID:minimap2-1FF947E PN:minimap2 VN:2.7-r654 CL:minimap2 -ax splice -t 10 -G50k -k 21 -w 11 --sr -A2 -B8 -O12,32 -E2,1 -r200 -p.5 -N20 -f1000,5000 -n2 -m20 -s40 -g2000 -2K50m --secondary=no genome.fa sc_bams/HP_104_Normal_soup/tmp.fq
@PG ID:minimap2-2E680064-4AC70EAD   PN:minimap2 VN:2.7-r654 CL:minimap2 -ax splice -t 10 -G50k -k 21 -w 11 --sr -A2 -B8 -O12,32 -E2,1 -r200 -p.5 -N20 -f1000,5000 -n2 -m20 -s40 -g2000 -2K50m --secondary=no genome.fa sc_bams/HP_104_Normal_soup/tmp.fq

and a unique read group is produced for each process, with a unique string appended to the ID. This bam should only have one read group (the top one is correct), but now has 10 read groups.

@RG ID:HP_104_Normal    LB:1    PL:ILLUMINA SM:HP_104_Normal    PU:1

@RG ID:HP_104_Normal-401FEFD5   LB:1    PL:ILLUMINA SM:HP_104_Normal    PU:1

@RG ID:HP_104_Normal-75ACD5C2   LB:1    PL:ILLUMINA SM:HP_104_Normal    PU:1

@RG ID:HP_104_Normal-17EC0C41   LB:1    PL:ILLUMINA SM:HP_104_Normal    PU:1

@RG ID:HP_104_Normal-58171F5E   LB:1    PL:ILLUMINA SM:HP_104_Normal    PU:1

@RG ID:HP_104_Normal-2AA79EC2   LB:1    PL:ILLUMINA SM:HP_104_Normal    PU:1

@RG ID:HP_104_Normal-738A0D0F   LB:1    PL:ILLUMINA SM:HP_104_Normal    PU:1

@RG ID:HP_104_Normal-4AEA10E1   LB:1    PL:ILLUMINA SM:HP_104_Normal    PU:1

@RG ID:HP_104_Normal-57A6356B   LB:1    PL:ILLUMINA SM:HP_104_Normal    PU:1

@RG ID:HP_104_Normal-5E8828B4   LB:1    PL:ILLUMINA SM:HP_104_Normal    PU:1
timoast commented 4 years ago

Should be fixed in 0.7.1 now