lmdu / pytrf

a python package for finding tandem repeats from genomic sequences
https://pytrf.readthedocs.io
MIT License
9 stars 0 forks source link

Question about motif identification #1

Open yifnzhao opened 3 years ago

yifnzhao commented 3 years ago

Hi, thanks for developing this great tool! I have a question about how the motifs are identified. In the example below, I am wondering why longer motifs, such as TCCCCTCCCACCCGG are not identified. In fact, when min_motif_size is set to >4, no vntrs were detected by VNTRMiner in this sequence.

name='6:168377992-168378192'
seq='CTCCCCCCTCCCACACCGGAGCCTTCTCTCCCCTCCCACCCGGGACCTCTTCTCCCCTCCCACCCGGGGCCTCCTCTCCCCTCCCACCCGGGACCTCTTCTCCCCCCCATCCGGGGCCTGCTGTCCCCTCCCACCCGGGACCTCTTCTCCCCTCCCATCCGGGGCCTCCTCTCCCCTCCCACCCGGGACCTCTTCTCCCCT'
for vntr in stria.VNTRMiner(name, seq, min_motif_size=4):
    print(vntr.as_dict())

>>> {'chrom': '6:168377992-168378192', 'start': 28, 'end': 37, 'motif': 'CTCCC', 'type': 5, 'repeats': 2, 'length': 10}
{'chrom': '6:168377992-168378192', 'start': 52, 'end': 61, 'motif': 'CTCCC', 'type': 5, 'repeats': 2, 'length': 10}
{'chrom': '6:168377992-168378192', 'start': 76, 'end': 85, 'motif': 'CTCCC', 'type': 5, 'repeats': 2, 'length': 10}
{'chrom': '6:168377992-168378192', 'start': 147, 'end': 156, 'motif': 'CTCCC', 'type': 5, 'repeats': 2, 'length': 10}
{'chrom': '6:168377992-168378192', 'start': 171, 'end': 180, 'motif': 'CTCCC', 'type': 5, 'repeats': 2, 'length': 10}
lmdu commented 4 months ago

fixed in v1.3.1