Closed zhengzhenxian closed 2 months ago
I used the HCC1395.bam 75x 223G .
run the workflow with LongPhase in conda Clair3 version 1.0.7 and ClairS commit f2606f66
.
The haplotag output tumor_chr20.bam file is 6.2G.
Could you provide your output log or data?
> samtools idxstats ONT_HCC1395/alignment-sort-hcc1395.bam
chr1 248956422 2384760 0
chr2 242193529 1537375 0
chr3 198295559 1386062 0
chr4 190214555 1449514 0
chr5 181538259 1171618 0
chr6 170805979 1121054 0
chr7 159345973 1720479 0
chr8 145138636 1113230 0
chr9 138394717 1025072 0
chr10 133797422 955545 0
chr11 135086622 798248 0
chr12 133275309 816144 0
chr13 114364328 646980 0
chr14 107043718 733506 0
chr15 101991189 675494 0
chr16 90338345 959480 0
chr17 83257441 647000 0
chr18 80373285 744354 0
chr19 58617616 379711 0
chr20 64444167 710333 0
chr21 46709983 410616 0
chr22 50818468 482982 0
chrX 156040895 620177 0
chrY 57227415 77634 0
chrM 16569 9241 0
... Omit chromosome
chrEBV 171823 28 0
* 0 0 1037967
> samtools idxstats clairs-output/tmp/clair3_output/phased_output/tumor_chr20.bam
chr1 248956422 0 0
chr2 242193529 0 0
chr3 198295559 0 0
chr4 190214555 0 0
chr5 181538259 0 0
chr6 170805979 0 0
chr7 159345973 0 0
chr8 145138636 0 0
chr9 138394717 0 0
chr10 133797422 0 0
chr11 135086622 0 0
chr12 133275309 0 0
chr13 114364328 0 0
chr14 107043718 0 0
chr15 101991189 0 0
chr16 90338345 0 0
chr17 83257441 0 0
chr18 80373285 0 0
chr19 58617616 0 0
chr20 64444167 710333 0
chr21 46709983 0 0
chr22 50818468 0 0
chrX 156040895 0 0
chrY 57227415 0 0
chrM 16569 0 0
... Omit chromosome
chrEBV 171823 0 0
* 0 0 0
> longphase output log
phased SNP file: test-ont-clairs-output/tmp/clair3_output/phased_output/tumor_phased_chr20.vcf.gz
phased SV file:
phased MOD file:
input bam file: ONT_HCC1395/alignment-sort-hcc1395.bam
input ref file: GCA_000001405.15_GRCh38_no_alt_analysis_set.fa
output bam file: test-ont-clairs-output/tmp/clair3_output/phased_output/tumor_chr20.bam
number of threads: 100
write log file: false
log file:
-------------------------------------------
tag region: chr20
filter mapping quality below: 1
percentage threshold: 0.6
tag supplementary: false
-------------------------------------------
parsing SNP VCF ... 0s
tag read start ...
chr: chr20 ... 97s
tag read 100s
-------------------------------------------
total process time: 100s
total alignment: 710333
total supplementary: 26197
total secondary: 0
total unmapped: 0
total tag alignment: 332186
total untagged: 378147
@sloth-eat-pudding
Thanks for the quick reply, sorry that I used the outdated workflow for evaluation. I tested with the latest code and the function works properly.
Hi, teams,
After the LongPhase haplotag v1.7 update discussed in issue before, we attempted to enhance the haplotagging process by introducing the "--regions" option to specify contig names for parallel processing of the BAM file. However, we observed that providing the region still generates the entire BAM file instead of a chromosome-level haplotagged BAM. As a result, the remaining unhaplotagged BAM consumes a large amount of hard disk space. It would be greatly appreciated if you could consider implementing a feature to output a smaller BAM file.
Our workflow is listed here, pls let me know if I misunderstood the option. Thanks!