logsdon-lab / CenMAP

Centromere mapping and annotation pipeline
MIT License
9 stars 0 forks source link

dna-brnn filtering step omits contigs incorrectly by length #13

Closed koisland closed 7 months ago

koisland commented 8 months ago

In the HGSVC3 assemblies, for certain chromosomes, filtering dna-brnn output incorrectly omits smaller, correct HOR arrays.

The last two samples were omitted because their lengths following running bedminmax.py were less than 1,000,000 bp.

(base) [koisland@sarlacc dna_brnn]$ cat chrY_H*_contigs.fwd.ALR.bed chrY_N*_contigs.fwd.ALR.bed
NA19239_chrY_haplotype1-0000027 *10010673        10027694        2       17021
NA19239_chrY_haplotype1-0000027 10142915        10147965        2       5050
NA19239_chrY_haplotype1-0000027 10150015        10154765        2       4750
NA19239_chrY_haplotype1-0000027 10155962        10160265        2       4303
NA19239_chrY_haplotype1-0000027 10162015        10176265        2       14250
NA19239_chrY_haplotype1-0000027 10183415        11016615        2       833200
NA19239_chrY_haplotype1-0000027 11017015        *11021215        2       4200
(base) [koisland@sarlacc dna_brnn]$ cat chrY_H*_contigs.rev.ALR.bed chrY_N*_contigs.rev.ALR.bed
HG00096_chrY_haplotype1-0000033 *23777138        23794155        2       17017
HG00096_chrY_haplotype1-0000033 23656906        23661955        2       5049
HG00096_chrY_haplotype1-0000033 23650105        23654805        2       4700
HG00096_chrY_haplotype1-0000033 23644919        23648905        2       3986
HG00096_chrY_haplotype1-0000033 23628905        23643205        2       14300
HG00096_chrY_haplotype1-0000033 23294605        23621805        2       327200
HG00096_chrY_haplotype1-0000033 23290005        *23294205        2       4200
HG03732_chrY_haplotype1-0000027 *41879184        41896234        2       17050
HG03732_chrY_haplotype1-0000027 41758934        41764034        2       5100
HG03732_chrY_haplotype1-0000027 41752134        41756884        2       4750
HG03732_chrY_haplotype1-0000027 41746634        41750934        2       4300
HG03732_chrY_haplotype1-0000027 41730634        41744884        2       14250
HG03732_chrY_haplotype1-0000027 41171934        41723485        2       551551
HG03732_chrY_haplotype1-0000027 41167333        *41171534        2       4201