Open koisland opened 6 months ago
On certain centromeres from chr1, inaccurate monomers are called by nhmmer resulting in incorrect plots.
nhmmer
Snakemake-Humas-HMMER calls one chr1 contig chr16 despite it being chr1. The original sequence is correctly labeled.
Snakemake-Humas-HMMER
[Mon May 6 14:07:33 2024] rule cens_humas_hmmer_analysis: input: results/humas_hmmer/cens/NA19036_rc-chr1_haplotype1-0000038:6617618-13645405.fa, data/models/AS-HORs-hmmer3.0-170921.hmm output: results/humas_hmmer/nhmmer-AS-HORs-hmmer3.0-170921-vs-NA19036_rc-chr1_haplotype1-0000038:6617618-13645405-tbl.out log: logs/humas_hmmer_analysis_NA19036_rc-chr1_haplotype1-0000038:6617618-13645405.log jobid: 10339 benchmark: benchmarks/humas_hmmer_analysis_NA19036_rc-chr1_haplotype1-0000038:6617618-13645405.tsv reason: Missing output files: results/humas_hmmer/nhmmer-AS-HORs-hmmer3.0-170921-vs-NA19036_rc-chr1_haplotype1-0000038:6617618-13645405-tbl.out wildcards: fname=NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 threads: 30 resources: mem_mb=20000, mem_mib=19074, disk_mb=1000, disk_mib=954, tmpdir=<TBD> nhmmer --cpu 30 --notextw --noali --tblout results/humas_hmmer/nhmmer-AS-HORs-hmmer3.0-170921-vs-NA19036_rc-chr1_haplotype1-0000038:6617618-13645405-tbl.out -o /dev/null data/models/AS-HORs-hmmer3.0-170921.hmm results/humas_hmmer/cens/NA19036_rc-chr1_haplotype1-0000038:6617618-13645405.fa &> logs/humas_hmmer_analysis_NA19036_rc-chr1_haplotype1-0000038:6617618-13645405.log
NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1326952 1327119 S1C16H1L.4 182.800000 - 1326952 1327119 255,255,10 NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327125 1327290 S1C16H1L.5 189.800000 - 1327125 1327290 255,255,10 NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327293 1327459 S1C16H1L.4 174.900000 - 1327293 1327459 255,255,10 NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327463 1327626 S1C16H1L.5 164.000000 - 1327463 1327626 255,255,10 NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327628 1327795 S1C16H1L.4 181.300000 - 1327628 1327795 255,255,10 NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327799 1327966 S1C16H1L.5 192.200000 - 1327799 1327966 255,255,10 NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327968 1328135 S1C16H1L.4 177.900000 - 1327968 1328135 255,255,10 NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1328137 1328305 S1CMH3d.1 160.700000 - 1328137 1328305 255,204,179 NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1328308 1328475 S1C16H1L.10 169.300000 - 1328308 1328475 255,255,10
Rerunning the hmmer workflow on the samples fixes the issue so the issue is not easily reproducible.
[Thu May 9 21:51:00 2024] rule cens_humas_hmmer_analysis: input: results/humas_hmmer_redo/cens/NA19036_rc-chr1_haplotype1-0000038:6617618-13645405.fa, data/models/AS-HORs-hmmer3.0-170921.hmm output: results/humas_hmmer_redo/nhmmer-AS-HORs-hmmer3.0-170921-vs-NA19036_rc-chr1_haplotype1-0000038:6617618-13645405-tbl.out log: logs/humas_hmmer_analysis_NA19036_rc-chr1_haplotype1-0000038:6617618-13645405.log jobid: 653 benchmark: benchmarks/humas_hmmer_analysis_NA19036_rc-chr1_haplotype1-0000038:6617618-13645405.tsv reason: Missing output files: results/humas_hmmer_redo/nhmmer-AS-HORs-hmmer3.0-170921-vs-NA19036_rc-chr1_haplotype1-0000038:6617618-13645405-tbl.out; Updated input files: results/humas_hmmer_redo/cens/NA19036_rc-chr1_haplotype1-0000038:6617618-13645405.fa wildcards: fname=NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 threads: 4 resources: mem_mb=20000, mem_mib=19074, disk_mb=1000, disk_mib=954, tmpdir=<TBD> nhmmer --cpu 4 --notextw --noali --tblout results/humas_hmmer_redo/nhmmer-AS-HORs-hmmer3.0-170921-vs-NA19036_rc-chr1_haplotype1-0000038:6617618-13645405-tbl.out -o /dev/null data/models/AS-HORs-hmmer3.0-170921.hmm results/humas_hmmer_redo/cens/NA19036_rc-chr1_haplotype1-0000038:6617618-13645405.fa &> logs/humas_hmmer_analysis_NA19036_rc-chr1_haplotype1-0000038:6617618-13645405.log
NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1326952 1327119 S1C1/5/19H1L.6/4 189.700000 - 1326952 1327119 120,171,154 NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327125 1327290 S1C16H1L.5 189.800000 - 1327125 1327290 255,255,10 NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327292 1327459 S1C1/5/19H1L.6/4 181.000000 - 1327292 1327459 120,171,154 NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327463 1327626 S1C1/5/19H1L.5 164.800000 - 1327463 1327626 120,171,154 NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327628 1327795 S1C1/5/19H1L.6/4 187.700000 - 1327628 1327795 120,171,154 NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327799 1327966 S1C16H1L.5 192.200000 - 1327799 1327966 255,255,10 NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327968 1328135 S1C1/5/19H1L.6/4 185.800000 - 1327968 1328135 120,171,154 NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1328137 1328305 S1C1/5/19H1L.3 185.900000 - 1328137 1328305 120,171,154 NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1328309 1328475 S1C1/5/19H1L.2 191.500000 - 1328309 1328475 120,171,154
On certain centromeres from chr1, inaccurate monomers are called by
nhmmer
resulting in incorrect plots.Snakemake-Humas-HMMER
calls one chr1 contig chr16 despite it being chr1. The original sequence is correctly labeled.Rerunning the hmmer workflow on the samples fixes the issue so the issue is not easily reproducible.