wtmatlock / flanker

Gene-flank analysis tool
MIT License
25 stars 6 forks source link

Gene query naming #30

Closed wtmatlock closed 3 years ago

wtmatlock commented 3 years ago

Since moving to a match vs. contains for gene annotation query, you have to add Abricate suffixes to get a hit i.e. need blaCTX-M-55_1 instead of blaCTX-M-55 - i think we need to find a new solution

wtmatlock commented 3 years ago

reminder contains is not a viable option as ablaCTX-M-5 query might annotate blaCTX-M-55 instead

wtmatlock commented 3 years ago

maybe move to regex queries? or is that overcomplicating things for the average user?

liampshaw commented 3 years ago

just to note this is a problem for --db resfinder - default option in abricate is ncbi and that doesn't contain suffixes. but you could also strip trailing characters after last _. and/or throw an error if no exact match that says something like Input gene "blaCTX-M-55" not found, did you mean {nearest match}?

samlipworth commented 3 years ago

think we can just do a .str.contains(gene) for fuzzy matching - might make this an option though?

bede commented 3 years ago

Would have thought a substring match is a happy default, but could have an optional arg like --exact to make it ...exact?

Or some regex to allow a trailing e.g. underscore but not an integer, which would address the blaCTX-M-5(5) issue