Closed andersgs closed 4 years ago
Thanks @andersgs for filing the issue, I used https://github.com/andersgs/db-check to clean up the allele database, it is a great tool! There are overlapping alleles in the database but they all belong to a single type.
Hello.
I have a sample that I expected to be
IV_44:z4z32
, butsistr
identified it asz4,z23,z32
. When I checked thefliC.fasta
DB I noticed that there were two identical entries (sequence wise) but one labeled withz4,z23,z32
and another withIV_44:z4z32
.While examing the DB, I found a few other potential issues. Full report below, with the cluster of sequences mentioned above detailed at the bottom.
Cheers. A.
db-check Report fliC
By agoncalves on 2018-11-22
Summary
Possible issues.
Breakdown
Total entries: 717
Total clusters: 622
Unique sequence IDs: 716
Cluster size distribution
Table: Distribution of cluster sizes (i.e., number of sequences).
Summary of duplicated IDs
Duplicate ID 11-2580|c
Table: Sequences with ID 11-2580|c
Category report
Distribution of categories by cluster of sequences.
Table: Distribution of categories by cluster of sequences.
Clusters with more than one category
Cluster 6
Title: Sequences in cluster 6
Cluster 24
Title: Sequences in cluster 24
Cluster 151
Title: Sequences in cluster 151
Cluster 163
Title: Sequences in cluster 163
Cluster 475
Title: Sequences in cluster 475
Cluster 582
Title: Sequences in cluster 582
Generated using db-check v0.1.4
db-check is on GitHub. Please submit issues