Closed maesaar closed 7 years ago
@mmaesaar first thing to realise is that all the AMR databases contain some partial genes, generally old PCR products from primers. Thus 100% coverage does not even mean the whole gene is present. It's worrying, yes.
the second thing to realise is what matters most is PROTEIN sequence, as that is the active molecule. so the 95% DNA identity could maybe be relaxed as it is AA identity that counts.
the third issue is if you are calling genes from draft genome assemblies, then you may get partial hits (< 60% coverage say) just because of the assembly. this can happen when there are two copies of a very similar betalactamase for example. If you use 95% cov you will not find anything, but in reality there are two copies!
So the answer is "it depends" on the question you need to ask. Sorry it can't be simpler!
I am constructing binary matrix from abricate output and I can't decide what percentages to go with. The idea was to report only genes that has >95% coverage and from that >95% identity. But I am concerned that it could be too strict - what's your opinion on the subject?
@tseemann please lable this "Help wanted"!
Thanks in advance