logsdon-lab / CenMAP

Centromere mapping and annotation pipeline
MIT License
9 stars 0 forks source link

Incorrect chr1 HOR monomer Classification #33

Open koisland opened 6 months ago

koisland commented 6 months ago

On certain centromeres from chr1, inaccurate monomers are called by nhmmer resulting in incorrect plots.

Snakemake-Humas-HMMER calls one chr1 contig chr16 despite it being chr1. The original sequence is correctly labeled.

[Mon May  6 14:07:33 2024]
rule cens_humas_hmmer_analysis:
    input: results/humas_hmmer/cens/NA19036_rc-chr1_haplotype1-0000038:6617618-13645405.fa, data/models/AS-HORs-hmmer3.0-170921.hmm
    output: results/humas_hmmer/nhmmer-AS-HORs-hmmer3.0-170921-vs-NA19036_rc-chr1_haplotype1-0000038:6617618-13645405-tbl.out
    log: logs/humas_hmmer_analysis_NA19036_rc-chr1_haplotype1-0000038:6617618-13645405.log
    jobid: 10339
    benchmark: benchmarks/humas_hmmer_analysis_NA19036_rc-chr1_haplotype1-0000038:6617618-13645405.tsv
    reason: Missing output files: results/humas_hmmer/nhmmer-AS-HORs-hmmer3.0-170921-vs-NA19036_rc-chr1_haplotype1-0000038:6617618-13645405-tbl.out
    wildcards: fname=NA19036_rc-chr1_haplotype1-0000038:6617618-13645405
    threads: 30
    resources: mem_mb=20000, mem_mib=19074, disk_mb=1000, disk_mib=954, tmpdir=<TBD>

        nhmmer         --cpu 30         --notextw         --noali         --tblout results/humas_hmmer/nhmmer-AS-HORs-hmmer3.0-170921-vs-NA19036_rc-chr1_haplotype1-0000038:6617618-13645405-tbl.out         -o /dev/null         data/models/AS-HORs-hmmer3.0-170921.hmm results/humas_hmmer/cens/NA19036_rc-chr1_haplotype1-0000038:6617618-13645405.fa &> logs/humas_hmmer_analysis_NA19036_rc-chr1_haplotype1-0000038:6617618-13645405.log
NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1326952 1327119 S1C16H1L.4  182.800000  -   1326952 1327119 255,255,10
NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327125 1327290 S1C16H1L.5  189.800000  -   1327125 1327290 255,255,10
NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327293 1327459 S1C16H1L.4  174.900000  -   1327293 1327459 255,255,10
NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327463 1327626 S1C16H1L.5  164.000000  -   1327463 1327626 255,255,10
NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327628 1327795 S1C16H1L.4  181.300000  -   1327628 1327795 255,255,10
NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327799 1327966 S1C16H1L.5  192.200000  -   1327799 1327966 255,255,10
NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327968 1328135 S1C16H1L.4  177.900000  -   1327968 1328135 255,255,10
NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1328137 1328305 S1CMH3d.1   160.700000  -   1328137 1328305 255,204,179
NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1328308 1328475 S1C16H1L.10 169.300000  -   1328308 1328475 255,255,10

humas_issue_chr1_rm humas_issue_stv_plot

Rerunning the hmmer workflow on the samples fixes the issue so the issue is not easily reproducible.

[Thu May  9 21:51:00 2024]
rule cens_humas_hmmer_analysis:
    input: results/humas_hmmer_redo/cens/NA19036_rc-chr1_haplotype1-0000038:6617618-13645405.fa, data/models/AS-HORs-hmmer3.0-170921.hmm
    output: results/humas_hmmer_redo/nhmmer-AS-HORs-hmmer3.0-170921-vs-NA19036_rc-chr1_haplotype1-0000038:6617618-13645405-tbl.out
    log: logs/humas_hmmer_analysis_NA19036_rc-chr1_haplotype1-0000038:6617618-13645405.log
    jobid: 653
    benchmark: benchmarks/humas_hmmer_analysis_NA19036_rc-chr1_haplotype1-0000038:6617618-13645405.tsv
    reason: Missing output files: results/humas_hmmer_redo/nhmmer-AS-HORs-hmmer3.0-170921-vs-NA19036_rc-chr1_haplotype1-0000038:6617618-13645405-tbl.out; Updated input files: results/humas_hmmer_redo/cens/NA19036_rc-chr1_haplotype1-0000038:6617618-13645405.fa
    wildcards: fname=NA19036_rc-chr1_haplotype1-0000038:6617618-13645405
    threads: 4
    resources: mem_mb=20000, mem_mib=19074, disk_mb=1000, disk_mib=954, tmpdir=<TBD>

        nhmmer         --cpu 4         --notextw         --noali         --tblout results/humas_hmmer_redo/nhmmer-AS-HORs-hmmer3.0-170921-vs-NA19036_rc-chr1_haplotype1-0000038:6617618-13645405-tbl.out         -o /dev/null         data/models/AS-HORs-hmmer3.0-170921.hmm results/humas_hmmer_redo/cens/NA19036_rc-chr1_haplotype1-0000038:6617618-13645405.fa &> logs/humas_hmmer_analysis_NA19036_rc-chr1_haplotype1-0000038:6617618-13645405.log
NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1326952 1327119 S1C1/5/19H1L.6/4    189.700000  -   1326952 1327119 120,171,154
NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327125 1327290 S1C16H1L.5  189.800000  -   1327125 1327290 255,255,10
NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327292 1327459 S1C1/5/19H1L.6/4    181.000000  -   1327292 1327459 120,171,154
NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327463 1327626 S1C1/5/19H1L.5  164.800000  -   1327463 1327626 120,171,154
NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327628 1327795 S1C1/5/19H1L.6/4    187.700000  -   1327628 1327795 120,171,154
NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327799 1327966 S1C16H1L.5  192.200000  -   1327799 1327966 255,255,10
NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1327968 1328135 S1C1/5/19H1L.6/4    185.800000  -   1327968 1328135 120,171,154
NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1328137 1328305 S1C1/5/19H1L.3  185.900000  -   1328137 1328305 120,171,154
NA19036_rc-chr1_haplotype1-0000038:6617618-13645405 1328309 1328475 S1C1/5/19H1L.2  191.500000  -   1328309 1328475 120,171,154

humas_issue_rerun_stv_plot