wilkelab / Opfi

A Python package for discovery, annotation, and analysis of gene clusters in genomics or metagenomics data sets.
https://opfi.readthedocs.io/
MIT License
21 stars 5 forks source link

Rule for groups of features #146

Closed jimrybarski closed 4 years ago

jimrybarski commented 4 years ago

There should be a rule that lets a user look for groups of features that are no more than N bp apart, where the features can occur in any order, and are optionally oriented in the same direction.

My specific problem is that I want to be able to look for systems where a transposase is close to some set of cas genes, but the specific cas gene that is adjacent to the transposase could vary. For example, I want a rule that allows me to select systems that have either transposase-cas9-cas1-cas2 or cas1-cas2-cas9-transposase, and the gap between each feature is no more than 50 bp apart.

This would also allow for finding simple things like tnsA-tnsB-tnsC - we can do this with current rules but it's a bit tedious, and I feel like this is general enough of a problem that it should be a part of the library.